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Abstract 

The paper studies the problem of distributed static parameter (vector) estimation in sensor networks with nonlinear 
observation models and imperfect inter-sensor communication. We introduce the concept of separably estimable 
observation models, which generalizes the observability condition for linear centralized estimation to nonlinear 
distributed estimation. It studies the algorithms MU (with its linear counterpart CU) and M CIA for distributed 
estimation in separably estimable models. It proves consistency (all sensors reach consensus almost surely and converge 
to the true parameter value,) asymptotic unbiasedness, and asymptotic normality of these algorithms. Both algorithms 
are characterized by appropriately chosen decaying weight sequences in the estimate update rule. While the algorithm 
MU is analyzed in the framework of stochastic approximation theory, the algorithm M CIA exhibits mixed time-scale 
behavior and biased perturbations and requires a different approach, which is developed in the paper. 
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I. Introduction 

A. Background and Motivation 

Wireless sensor network (WSN) applications generally consist of a large number of sensors which coordinate to 
perform a task in a distributed fashion. Unlike fusion-center based applications, there is no center, and the task is 
performed locally at each sensor with intermittent inter-sensor message exchanges. In a coordinated environment 
monitoring or surveillance task, it translates to each sensor observing a part of the field of interest. With such local 
information, it is not possible for a particular sensor to get a reasonable estimate of the field. Then, the sensors 
need to cooperate, and this is achieved by intermittent data exchanges among the sensors, whereby each sensor 
fuses its version of the estimate from time to time with those of other sensors with which it can communicate 
(in this context, see [1], [2], [3], [4], for a treatment of general distributed stochastic algorithms.) We consider the 
above problem in this paper in the context of distributed parameter estimation in WSNs. As an abstraction of the 
environment, we model it by a static vector parameter, whose dimension, M, can be arbitrarily large. We assume 
that each sensor receives noisy measurements (not necessarily additive) of only a part of the parameter vector More 
specifically, if M„ is the dimension of the observation space of the n-th sensor, M„ <C M. Assuming that the 
rate of receiving observations at each sensor is comparable to the data exchange rate among sensors, each sensor 
updates its estimate at time index i by fusing it appropriately with the observation (innovation) received at i and 
the estimates at i of those sensors with which it can communicate at i. We propose and study two generic recursive 
distributed iterative estimation algorithms in this paper, namely, J\fU and J\fCU for distributed parameter estimation 
with possibly nonhnear observation models at each sensor. As is required, even by centralized estimation schemes, 
for the estimate sequences generated by the AfU and J\f£U algorithms at each sensor to have desirable statistical 
properties, we impose an observability condition. To this end, we introduce a generic observability condition, the 
separably estimable condition for distributed parameter estimation in nonlinear observation models, which generalize 
the observability condition of centralized parameter estimation. 

The inter-sensor communication is quantized and the communication links among sensors are subject to random 
failures. This is appropriate, for example, in digital cormnunication in WSN when the data exchanges among a 
sensor and its neighbors are quantized, and the coimnunication channels may fail, e.g., as when packet dropouts 
occur randomly. We consider a very generic model of temporally independent link failures, whereby it is assumed 
that the sequence of network Laplacians, {i(i)}i>o are i.i.d. with mean L and satisfying X2{L) > 0. We do not 
make any distributional assumptions on the link failure model. Although the link failures, and so the Laplacians, are 
independent at different times, during the same iteration, the link failures can be spatially dependent, i.e., correlated. 
This is more general and subsumes the erasure network model, where the link failures are independent over space 
and time. Wireless sensor networks motivate this model since interference among the wireless communication 
channels correlates the link failures over space, while, over time, it is still reasonable to assume that the chaimels 
are memoryless or independent. In particular, we do not require that the random instantiations of communication 
graph be coimected; in fact, it is possible to have all these instantiations to be disconnected. We only require that 
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the graph stays connected on average. This is captured by requiring that A2 (i) > 0, enabling us to capture a broad 
class of asynchronous communication models, as will be explained in the paper. 

As is required by even centralized estimation schemes, for the estimate sequences generated by the AfU and AfjCU 
algorithms to have desirable statistical properties, we need to impose some observabiUty condition. To this end, we 
introduce a generic observability condition, the separably estimable condition for distributed parameter estimation 
in nonUnear observation models, which generalizes the observability condition of centralized parameter estimation. 
To motivate the separably estimable condition for nonlinear problems, we start with the Unear model for which 
it reduces to a rank condition on the overall observabiUty Grammian. We propose the algorithm CU for the 
Unear model and using stochastic approximation show that the estimate sequence generated at each sensor is 
consistent, asymptoticaUy unbiased, and asymptotically normal. We explicitly characterize the asymptotic variance 
and, in certain cases, compare it with the asymptotic variance of a centraUzed scheme. The CU algorithm can 
be regarded as a generalization of consensus algorithms (see, for example, [5], [6], [7], [8], [9], [10], [11], [12], 
[13], [14], [15], [16], [17]), the latter being a specific case of the CLi with no innovations. The algorithm J\fU 
is the natural generalization of the CU to nonUnear separably estimably models. Under reasonably assumptions 
on the model, we prove consistency, asymptotic unbiasedness, and asymptotic normality of the algorithm MU. 
An important aspect of these algorithms is the time-varying weight sequences (decaying to zero as the iterations 
progress) associated with the consensus and innovation updates. The algorithm J\fU (and its linear counterpart 
CU) is characterized by the same decay rate of the consensus and innovation weight sequences and, hence, its 
analysis faUs under the framework of stochastic approximation. The algorithm MU provides desirable performance 
guarantees (consistency, asymptotic unbiasedness, and asymptotic normaUty), though it requires further assumptions 
on the separably estimable observation models. We thus introduce the J\fCU algorithm, which leads to consistent 
and asymptotic unbiased estimators at each sensor for all separably estimable models. In the context of stochastic 
algorithms, MCU can be viewed as exhibiting mixed time-scale behavior (the weight sequences associated with 
the consensus and innovation updates decay at different rates) and consisting of unbiased perturbations (detailed 
explanation is provided in the paper.) The MCU algorithm does not faU under the purview of standard stochastic 
approximation theory, and its analysis requires an altogether different framework as developed in the paper. The 
algorithm J\fCU is thus more reliable than the MU algorithm, as the latter requires further assumptions on the 
separably estimable observation models. On the other hand, in cases where the J\fU algorithm is appUcable, it 
provides convergence rate guarantees (for example, asymptotic normaUty,) which foUow from standard stochastic 
approximation theory, while MCIA does not fall under the purview of standard stochastic approximation theory and, 
hence, does not inherit these convergence rate properties. 

We comment on the relevant recent Uterature on distributed estimation in WSNs. The papers [18], [19], [20], 
[21] study the estimation problem in static networks, where either the sensors take a single snapshot of the field at 
the start and then initiate distributed consensus protocols (or, more generaUy, distributed optimization, as in [19]) to 
fuse the initial estimates, or the observation rate of the sensors is assumed to be much slower than the inter-sensor 
communication rate, thus permitting a separation of the two time-scales. On the contrary, our work considers new 
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observations at every time iteration, and the consensus and observation (innovation) updates are incorporated in the 
same iteration. More relevant to our present work are [22], [23], [24], [25], which consider the Hnear estimation 
problem in non-random networks, where the observation and consensus protocols are incorporated in the same 
iteration. In [22], [24] the distributed hnear estimation problems are treated in the context of distributed least-mean- 
square (LMS) filtering, where constant weight sequences are used to prove mean-square stability of the filter The 
use of non-decaying combining weights in [22], [24], [25] lead to a residual error, however, under appropriate 
assumptions, these algorithms can be adapted for tracking certain time-varying parameters. The distributed LMS 
algorithm in [23] also considers decaying weight sequences, thereby establishing £2 convergence to the true 
parameter value. Apart from treating generic separably estimable nonlinear observation models, in the linear case, 
our algorithm CU leads to asymptotic normahty in addition to consistency and asymptotic unbiasedness in random 
time-varying networks with quantized inter-sensor communication. 

We briefly comment on the organization of the rest of the paper. The rest of this section introduces notation and 
preliminaries, to be adopted throughout the paper To motivate the generic nonlinear problem, we study the linear 
case (algorithm CU) in Section HI] Section |lll] studies the generic separably estimable models and the algorithm 
NU, whereas algorithm 7V^£W is presented in Section UVl FinaUy, Section |V] concludes the paper. Four Appendices 
provide detailed proofs of several Lemmas and Theorems presented in Section |IV] 



For completeness, this subsection sets notation and presents preliminaries on algebraic graph theory, matrices, 
and dithered quantization to be used in the sequel. 

Preliminaries. We denote the fc-dimensional Euclidean space by M*^ ^ ^ . The fc x fc identity matrix is denoted by 
Ik, while IkT^k denote respectively the column vector of ones and zeros in M''^^. We also define the rank one 
k X k matrix Pj. by 



The only non-zero eigenvalue of is one, and the corresponding normalized eigenvector is (l/Vfcj Ifc. The 
operator || || applied to a vector denotes the standard Euclidean 2-norm, while applied to matrices denotes the 
induced 2-norm, which is equivalent to the matrix spectral radius for symmetric matrices. 

We assume that the parameter to be estimated belongs to a subset U of the Euclidean space M*^^^. Throughout 
the paper, the true (but unknown) value of the parameter is denoted by 9* . We denote a canonical element of U 
by 9. The estimate of 9* at time i at sensor n is denoted by x„(i) € M*^^^. Without loss of generality, we assume 
that the initial estimate, x„(0), at time at sensor n is a non-random quantity. 

Throughout, we assume that all the random objects are defined on a common measurable space, (fi, J^). In case 
the true (but unknown) parameter value is 9* , the probability and expectation operators are denoted by Pg* [•] and 
Eg. [•], respectively. When the context is clear, we abuse notation by dropping the subscript. Also, all inequalities 
involving random variables are to be interpreted a.s. (almost surely.) 



B. Notation 



Pk — ■^-'-fc-'-fc 



(1) 
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Spectral graph theory. We review elementary concepts from spectral graph theory. For an undirected graph 
G — {V, E), = [1 • • • A^] is the set of nodes or vertices, \V\ = N, and E is the set of edges, \E\ — M, where | • | 
is the cardinality. The unordered pair {n, I) E E if there exists an edge between nodes n and I. We only consider 
simple graphs, i.e., graphs devoid of self-loops and multiple edges. A graph is connected if there exists a patlQ, 
between each pair of nodes. The neighborhood of node n is 

iln^ {I eV\{n,l) e E} (2) 

Node n has degree (i„ = (number of edges with n as one end point.) The structure of the graph can be 
described by the symmetric N x N adjacency matrix, A ~ [Ani], A^i — 1, if (n, I) e E, A^i — 0, otherwise. Let 
the degree matrix be the diagonal matrix D = diag {di • • • djv)- The graph Laplacian matrix, L, is 

L = D- A (3) 

The Laplacian is a positive semidefinite matrix; hence, its eigenvalues can be ordered as 

0-Ai(L)<A2(£)<---<Ajv(L) (4) 



The smallest eigenvalue Ai(/) is always equal to zero, with 1^ being the corresponding normalized 

eigenvector The multiplicity of the zero eigenvalue equals the number of connected components of the network; 
for a connected graph, X2{L) > 0. This second eigenvalue is the algebraic connectivity or the Fiedler value of the 
network; see [26], [27], [28] for detailed treatment of graphs and their spectral theory. 

Kronecker product. Since, we are dealing with vector parameters, most of the matrix manipulations will involve 
Kronecker products. For example, the Kronecker product of the NxN matrix L and Im will be an NM x NAI ma- 
trix, denoted by L®Im- We will deal often with matrices of the form C — [Inm — bL (g) Im — o-Inm ~ Pn ® Im]- 
It follows from the properties of Kronecker products and the matrices L, P, that the eigenvalues of this matrix C 
are —a and 1 — b\i{L) — a, 2 < i < N, each being repeated M times. 

We now review results from statistical quantization theory. 

Quantizer: We assume that all sensors are equipped with identical quantizers, which uniformly quantize each 
component of the M-dimensional estimates by the quantizing function, q(-) : M*^^^ Q^^ . For y e M*^^^ the 
channel input, 

q(y) = [fciA, • • • , fcMA], (km - ^)A < y, < (fc„, + i)A, 1 < m < M (5) 

= y + e(y), <eiy) <^1n, Vy (6) 

where e(y) is the quantization error and the inequalities are interpreted component- wise. The quantizer alphabet is 



Q'' = {[fciA,... ,/cmA] 



h e Z, yi\ (7) 



'a path between nodes n and I of length m is a sequence (n = jq, ii , ■ ■ ■ ,im = I) of vertices, such that, (jj;, ik+i) G-EVO<fc<m — 1. 
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We take the quantizer alphabet to be countable because no h priori bound is assumed on the parameter. 

Conditioned on the input, the quantization error e(y) is deterministic. This strong correlation of the error 
with the input creates unacceptable statistical properties. In particular, for iterative algorithms, it leads to error 
accumulation and divergence of the algorithm (see the discussion in [29].) To avoid this divergence, we consider 
dithered quantization, which makes the quantization error possess nice statistical properties. We review briefly basic 
results on dithered quantization, which are needed in the sequel. 

Dithered Quantization: Schuchman Conditions Consider a uniform scalar quantizer g( ) of step-size A, where 
y e M is the channel input. Let {2/(i)}j>o be a scalar input sequence to which we added a dither sequence {z^(«)}i>o 
of i.i.d. uniformly distributed random variables on [—A/2, A/2), independent of the input sequence {y(i)}i>o- This 
is a sufficient condition for the dither to satisfy the Schuchman conditions (see [30], [31], [32], [33]). Under these 
conditions, the error sequence for subtractively dithered systems ([31]) {e(i)}i>o 

<i) = <l{yii) + - {y{i) + i^ii)) (8) 

is an i.i.d. sequence of uniformly distributed random variables on [—A/2, A/2), which is independent of the input 
sequence {y(«)}i>o- To be more precise, this result is valid if the quantizer does not overload, which is trivially 
satisfied here as the dynamic range of the quantizer is the entire real Une. Thus, by randomizing appropriately the 
input to a uniform quantizer, we can render the error to be independent of the input and uniformly distributed on 
[—A/2, A/2). This leads to nice statistical properties of the error, which we wiU exploit in this paper. 

Random Link Failure. In digital conmiunications, packets may be lost at random times. To account for this, we 
let the links (or connmunication channels among sensors) to fail, so that the edge set and the connectivity graph of 
the sensor network are time varying. Accordingly, the sensor network at time i is modeled as an undirected graph, 
G{i) = {V,E{i)) and the graph Laplacians as a sequence of i.i.d. Laplacian matrices {-t'(i)}i>o- write 

L{i) = L + L{i), > (9) 

where the mean L = E [i(i)]. We do not make any distributional assumptions on the Unk failure model. Although 
the Unk failures, and so the Laplacians, are independent at different times, during the same iteration, the link 
failures can be spatially dependent, i.e., correlated. This is more general and subsumes the erasure network model, 
where the link failures are independent over space and time. Wireless sensor networks motivate this model since 
interference among the wireless communication channels correlates the link failures over space, while, over time, 
it is stiU reasonable to assume that the channels are memoryless or independent. 

Connectedness of the graph is an important issue. We do not require that the random instantiations G{i) of the 
graph be connected; in fact, it is possible to have aU these instantiations to be disconnected. We only require that 
the graph stays connected on average. This is captured by requiring that A2 (X) > 0, enabling us to capture a broad 
class of asynchronous communication models; for example, the random asynchronous gossip protocol analyzed 
in [34] satisfies A2 (L) > and hence faUs under this framework. 
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II. Distributed Linear Parameter Estimation: Algorithm CU 

In this section, we consider the algorithm CU for distributed parameter estimation when the observation model 
is linear This problem motivates the generic separably estimable nonlinear observation models considered in 
Sections Hill and HVl Subsection III- Al sets up the distributed linear estimation problem and presents the algorithm CU. 
Subsection III-BI establishes the consistency and asymptotic unbiasedness of the CU algorithm, where we show 
that, under the CU algorithm, all sensors converge a.s. to the true parameter value, 0* . Convergence rate analysis 
(asymptotic normality) is carried out in Subsection III-CI while Subsection III-DI illustrates CU with an example. 

A. Problem Formulation: Algorithm CU 

Let 6* e M'*^^^ be an Af -dimensional parameter that is to be estimated by a network of N sensors. We refer to 9 
as a parameter, although it is a vector of M parameters. Each sensor makes i.i.d. observations of noise corrupted 
linear functions of the parameter. We assume the following observation model for the n-th sensor: 

Znii) = Hnii)e* + Cnii) (10) 

where: {z„(i) eM^^"^^}^^^ is the i.i.d. observation sequence for the n-th sensor; {Cn(*)}i>o is ^ zero-mean 
i.i.d. noise sequence of bounded variance; and {^^ra(j)}i>o ^'^ sequence of observation matrices with mean 
Hn and bounded second moment. For most practical sensor network applications, each sensor observes only a 
subset of Mn of the components of 6, with Af„ ^ M. Under such a situation, in isolation, each sensor can 
estimate at most only a part of the parameter. However, if the sensor network is connected in the mean sense (see 
Section II-Bl i. and under appropriate observability conditions, we will show that it is possible for each sensor to get 
a consistent estimate of the parameter 9* by means of quantized local inter-sensor communication. 

In this subsection, we present the algorithm CU for distributed parameter estimation in the linear observation 
model ( [Tol l. Starting from some initial deterministic estimate of the parameters (the initial states may be random, 
we assume deterministic for notational simplicity), x„(0) £ M*^^^, each sensor generates by a distributed iterative 
algorithm a sequence of estimates, {x„ (?')}i>o- Th^ parameter estimate x„ (i + 1) at the n-th sensor at time « + 1 is 
a function of: its previous estimate; the communicated quantized estimates at time i of its neighboring sensors; and 
the new observation z„(i). As described in Section iLBl the data is subtractively dithered quantized, i.e., there exists 
a vector quantizer q(.) and a family, {^''^"(i)}, of i.i.d. uniformly distributed random variables on [—A/2, A/2) 
such that the quantized data received by the n-th sensor from the l-th sensor at time i is q(x;(z) + where 
Vni{i) — [^'njC*);''' i'^ni^^)V- ^^^^ follows from the discussion in Section ILB] that the quantization error, 
£ni{i) G R*^^^ given by is a random vector, whose components are i.i.d. uniform on [—A/2, A/2) and 
independent of x/(i). 

Algorithm CU Based on the current state x„(i), the quantized exchanged data {q(xi(i) + i^ni(*))}ien„(i)' ^^'^ 
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the observation z„(i), we update the estimate at the n-th sensor by the following distributed iterative algorithm: 



x„(i + 1) = x„(i) - a{i) 



;eo„(i) 



(11) 



In (fTTT l. 6 > is a constant and {a(i)}i>o ^ sequence of weights with properties to be defined below. Algo- 
rithm ( fTTT i is distributed because for sensor n it involves only the data from the sensors in its neighborhood fi„(i). 
Using eqn. (O, the state update can be written as 



x„(j + 1) = x„(i) - a{i) 



b ^ (x„(i) - x;(i)) - H„ (z„(i) - i7„x„(i)) + 6i^„i(i) + 
ien„(i) 



(12) 



We rewrite ( fT2] i in compact form. Define the random vectors, T(i) and ^{i) G jr^^a^xi with vector components 

(13) 



/esi„(j) 

It follows from the Schuchman conditions on the dither, see Section 11^ that 



;[T(i)] = E[*(i)] = 0, Vi 



supE 



\ri^) 



sup] 



< 



NiN - 1)MA2 
12 



from which we then have 



supE 



< 2supE 



|T(z)|| 



2supE 



< 



N{N~l)MA^ 



(14) 



(15) 
(16) 



(17) 



Also, define the noise covariance matrix Sq as 



Sq 



(YW + *(z))(T(z) + *(z))^ 



The iterations in ( fTTT) can be written in compact form as: 



x(i + 1) = x(i) - 5(L(i) (g) /Af)x(i) - z(i) - DHy.{i) + 6T(i) + 



(18) 



(19) 



Here, x(i) = [x|^(i) • • •x^(i)] is the vector of sensor states (estimates.) The sequence of Laplacian matrices 
{L(i)}j>Q captures the topology of the sensor network . They are random, see Section IFBI to accommodate link 
failures, which occur in packet communications. We also define the matrices Djj and Djj as 



Djj = diag 



— T — T ■ 



and = DjjDjj = diag 



hIHi ■ ■ ■ hJ^Hn 



(20) 



9 



We refer to the recursive estimation algorithm in eqn. (fT9] l as CU. We now summarize formally the assumptions 
on the CU algorithm and their implications. 
A.l)Observation Noise. Recall the observation model in eqn. (fTOl i. We assume that the observation noise process, 

|C(*) = [Ci"(*)i ■ ' ' iC^(*)]^| is i-i-d- zero mean process, with finite second moment. In particular, the 

I J ?'>0 

observation noise covariance is independent of i 

E[CWC^(j)] =S^5,„ yi,j>0 (21) 

where the Kronecker symbol Sij ~ 1 if i = j and zero otherwise. Note that the observation noises at different 
sensors may be correlated during a particular iteration. Eqn. (ISTT i states only temporal independence. The spatial 
correlation of the observation noise makes our model applicable to practical sensor network problems, for instance, 
for distributed target localization, where the observation noise is generally correlated across sensors. 
A.2)Observability. We assume that the observation matrices, {[Hi{i), ■ ■ ■ , i/jv(i)]}i>0' fo™ an i.i.d. sequence 
with mean [S^i , • • • , if at] and finite second moment. In particular, we have 

H^a{i) = Hn+H„{t),yi,n (22) 

where, Hn = E [Hn(i)] , Vi, n and < Hi(i), • • • , HNii) \ is a zero mean i.i.d. sequence with finite second 

I L J J i>o 

moment. Here, also, we require only temporal independence of the observation matrices, but allow them to be 
spatially correlated. We require the following global observability condition. The matrix G 

N 



G = J2h^H^ (23) 



n=l 

is full-rank. This distributed observabiUty extends the observability condition for a centralized estimator to get a 
consistent estimate of the parameter 6*. We note that the information available to the n-th sensor at any time i 
about the corresponding observation matrix is just the mean and not the random Hn{i)- Hence, the state 
update equation uses only the HnS, as given in eqn. (fTTT i. 
A.3)Persistence Condition. The weight sequence {a(i)}j>Q satisfies 

> 0, ^a(i) = oo, ^a^(i)<oo (24) 

This condition is commonly assumed in adaptive control and signal processing and implies, in particular, that, 
a{i) 0. Examples include 

a{i) .5 < /3< 1 (25) 

if 

A.4)Independence Assumptions. The sequences {i(i)}j>o'{Cn(j)}i<„<Ar^ i>0'{-^"(*)}i<«<Ar,j>0'{'^r™(*)} are 
mutually independent. 



10 



Markov. Consider the filtration, {^f }j>o. given by 

= a (x(0), {L{j), z(j), T(j), nJ)}o<j<^) (26) 

It then follows that the random objects L{i), z{i), T(i), are independent of J^, rendering {x(i), J^}^>q a 
Markov process. 

B. Consistency of CU 

We recall standard definitions from sequential estimation theory (see, for example, [35]). 
Definition 1 (Consistency) : A sequence of estimates {x*(i)}j>g is called consistent if 



lim x*(i) = e* 



1, ye* gu (27) 



or, in other words, if the estimate sequence converges a.s. to the true parameter value. The above definition of 
consistency is also called strong consistency. When the convergence is in probability, we get weak consistency. In 
this paper, we use the term consistency to mean strong consistency, which implies weak consistency. 

Definition 2 (Asymptotic Unbiasedness) : 

A sequence of estimates {x*(z)}j>g is called asymptotically unbiased if 

lim Eg. [x'(i)] = e*, \/e* €U (28) 

The main result of this subsection concerns the consistency and asymptotic unbiasedness of the CU algorithm. 
Before proceeding further we state the following result. 

Lemma 3 Consider the jCU algorithm under Assumptions A.1-4. Then, the matrix [bL (g) /m + Djj\ is synmietric 
positive definite. 

Proof: Synmietricity is obvious. It also follows from the properties of Laplacian matrices and the structure 
of Djj that these matrices are positive semidefinite. Then the matrix [bL ® Lm + i's"] is positive semidefinite, 
being the sum of two positive semidefinite matrices. To prove positive definiteness, assume, on the contrary, that 
the matrix \bL ® Im + DJJ^^ is not positive definite. Then, there exists, x e M^^^^, such that x ^ and 

x"^ [bL i»Im+ Djj] X = (29) 

From the positive semidefiniteness of L® Im and Djj, and the fact that 6 > 0, it follows 

x"^ [I O /m] X = 0, x^DjjX = (30) 

Write X in the partitioned form, 

x= [xf •••x?^]^, x„eM^^\ Vl<n< (31) 



It follows from the properties of Laplacian matrices and the fact that X2{L) > 0, that eqn. ( |30] | holds iff 

x„ = a, Vn (32) 
where a e R^^^^, and a ^ 0. Also, eqn. (|30] | implies 

N 

^IhIHu^u = (33) 



n=l 

This together with eqn. ( |32] i implies 

a^Ga = (34) 

where G is defined in eqn. ( |23] |. This is clearly a contradiction, because, G is positive definite by Assumption A.2 
and a 7^ 0. Thus, we conclude that the matrix [6L (g) /m + -Dtj ] is positive definite. ■ 
We now present the following result regarding the asymptotic unbiasedness of the estimate sequence. 

Theorem 4 (CU: Asymptotic unbiasedness) Consider the CU algorithm under Assumptions A.1-4 and let {x(i)}^>g 
be the state sequence generated. Then we have 

lim E[x„(i)] = r, l<n<N (35) 

i — >oo 

In other words, the estimate sequence, {x„(i)}^>p, generated at a sensor n is asymptotically unbiased. 

Proof: Taking expectations on both sides of eqn. ( fT9] l and using the independence assumptions (Assump- 
tion A.4), we have 

E [x(i + 1)] = E [x(z)] - a{i) [h (L ® Im) E [x(i)] + Dj^E [x(i)] - D^E [z{i)]] (36) 
Subtracting In <E) 6* from both sides of eqn. ( l36l l and noting that 

(I® /m) [In ® e*) = 0, DjjE [z(^)] = Dt^{1n ® r) (37) 

we have 

E [x(i + 1)] - 1^, (g) 6** = [/jvA/ - {bL(g)lM + Dj^-)] [E [x(i)] - Ijv «) 9*] (38) 

Define, Amin (&-^ fg + f^f) and Amax {bL (g) /m + D-jj) to be the smallest and largest eigenvalues of the positive 
definite matrix [bL ig) Im + D-^] (see Lemma [3]) Since, a{i) (Assumption A.3), there exists io, such that, 

a(«o) < T TT^r^r ^ *o (39) 

Amax (OL g) /m + Djy) 



Continuing the recursion in eqn. (138b . we have, for i > io, 

E [x(i)] - Iat (g 0* = I [/ATM - (51 (g /m + ^u-)] I [E [x(io)] - Ijv ® ^1 (40) 
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Eqn. ( l40b implies 

||E[x(i)]-ljv®ril < ||/ArAf-a(j)(6L® JA/ + D^)||j ||E[x(io)]-ljv®r|l, z > io (41) 

It follows from eqn. (|39] l 

II^ATM - (61 (g) /m + Djj) 11=1- a(j)Ami„ (&! Im + i'ij) , j > io (42) 
Eqns. (141142b now give 

||E[x(z)]-lAr®r|| < (l-a(j>„,in(6L®/Af + £'7r))j P[x(io)]-ljv(»r||, z>io (43) 

Using the inequality, 1 ~ a < e^'^, for < a < 1, we finally get 

||E[x(i)]-ljv®r|| <e"^™(''^®''^+''«)^5=-a"(^)||E[x(zo)]-ljv<g^*||, « > «o (44) 
Since, Amin (6£ ® /a/ + -Dfl") > and the weight sequence sums to infinity, we have 

lim |lE[x(i)]-ljv®r|| =0 (45) 

i — ^oo 

and the theorem follows. ■ 
We prove that, under the assumptions of the £2J algorithm (see Subsection III-Ab . the state sequence, {x(i)}j^Q, 
satisfies 



lim x„(i) = 6* , Vn 



= 1 (46) 



In other words, the sensor states reach consensus asymptotically and converge a.s. to the true parameter value, 0*, 
thus yielding a consistent estimate at each sensor 

In the following, we present some classical results on stochastic approximation from [36] regarding the con- 
vergence properties of generic stochastic recursive procedures, which will be used to characterize the convergence 
properties (consistency, convergence rate) of the CU algorithm. 

Theorem 5 Let {x(i) £ R'^^j .^^ be a random vector sequence in M}^^, which evolves according to: 

x(i + 1) x(i) + a{i) [R{x{i)) +r{i + l, x{i),uj)] (47) 

where, i?(-) : M'^^ i — > R'^^ is Borel measurable and {r(i, x, a;)}j>Q xgr'xi ^ family of random vectors in 
R'^^, defined on some probability space (f7,JF, T'), and e 51 is a canonical element of 51. Consider the following 
sets of assumptions: 

B.l): The function r{i, ■, ■) : R^""^ x fl — > R'^^ is (g) measurablj] for every i. 

^B^ denotes the Borel algebra of R'^^. 
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B.2): There exists a filtration {^i}j>o of ^ ^ such that, for each i, the family of random vectors {F (i, x, w)}xgRixi 
is Ti measurable, zero-mean and independent of Ti-\. 

(Note that, if Assumptions B.l, B.2 are satisfied, the process, {x(i)} -^p, is Markov.) 

B.3): There exists a function V (yi) G C2 with bounded second order partial derivatives and a point x* e M'^^ 
satisfying: 

V (x*) = 0, y (x) > 0, X =^ X*, lim||x|Hoo V" (x) = 00 (48) 
sup.<||x-x-||<i (x) , (x)) < 0, Ve > (49) 
B.4): There exist constants k\,ki > 0, such that, 

||i?(x)||VE[||r(z + l,x,c^)||'] <fci(l + y(x))-A:2(i?(x),14(x)) (50) 

B. 5): The weight sequence {Q;(i)}j>Q satisfies 

a{i) > 0, Q^i = 00, ^^Q;^(i) < 00 (51) 

j>0 j>0 

C. l): The function R (x) admits the representation 

R{x.) ^ B{x-x*) + 6{x) (52) 

where 

(53, 

x^x* ||x — X* I 

(Note, in particular, if S (x) = 0, then eqn. ( |53] ) is satisfied.) 
C.l): The weight sequence, {Q:(i)}j>o is of the form. 



a{i) = ^, Vz>0 (54) 
I + 1 

where a > is a constant. (Note that C.2 implies B.5.) 
C.3): The matrix E, given by 

E = aS + (55) 

is stable. Here / is the / x I identity matrix and a, B are given in eqns. ( I54I52| |. respectively. 
C.4): The entries of the matrices 

A (?;, x) = E [r (i + 1, X, Lu) r'^ {i + 1, x, w)] , Vi > o, x e r'""^ (56) 

are finite and the following Umit exists: 



lim A {i, x) ~ So 



(57) 
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C.5): There exists e > 0, such that 



lim sup sup / ||r (i + 1, X, cj)|| dP = 



(58) 



Then we have the following: 

Let the Assumptions B.1-B.5 hold for the process, {x(i)}j>Q, given by eqn. (l47T i. Then, starting from an arbitrary 
initial state, the Markov process, {x(i)}j>Q, converges a.s. to x*. In other words. 



lim x(i) = X* 



= 1 



(59) 



The normalized process, |\/I(x(i) — x*)} .^p, is asymptotically normal if, in addition to Assumptions B.1-B.5, 
Assumptions C.1-C.5 are also satisfied. In particular, as z — > oo 



\/i(x(i) -x*) =^ J\f{0,S) 



(60) 



where =4> denotes convergence in distribution or weak convergence. Also, the asymptotic variance, S, in eqn. (|60l t 
is given by, 

rOO 

(61) 



Jo 

Proof: For a proof see [36] (c.f. Theorems 4.4.4, 6.6.1). ■ 
In the sequel, we will use Theorem |5] to establish the consistency and asymptotic normality of the €U algorithm. 
We now give the main result regarding the a.s. convergence of the iterate sequence. 

Theorem 6 (CIA: Consistency) Consider the CU algorithm with the assumptions stated in Subsection III-AI Then, 



lim x„(i) — 0* , \fn — 1 



(62) 



In other words, the estimate sequence {x„(i)}j>g at a sensor n, is a consistent estimate of the parameter 9. 

Proof: The proof follows by showing that the process {x(z)}^>g, generated by the £U algorithm, satisfies the 
Assumptions B.1-B.5 of Theorem |5] Recall the filtration, {.?^f }j>o^ given in eqn. ( |26] |. By adding and subtracting 
the vector 1^ ^ 6* and noting that 

(L(g,lM) (Ijv 



(63) 



eqn. ( fT9l ) can be written as 

x(i + l) = x{i)-a{i) 6 (X® /m) (x(i) - Ijv » 

In the notation of Theorem |5] eqn. (|64] | can be written as 



+ b {L{i) (g) /m) x(i) + D-jj(pc{i) -In^O*) 



+ br{i) + b^{i) 



(64) 



i{i + 1) = x(i) + a{i) [-R(x(i)) + r{i + l, x(i), w)] 



(65) 
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where 



i?(x) = -[bLi^lM + D„]{x-lN<»e*) 



T{i + l,^,uj) = - b[L{i)(g,lM)x- Djj[z{i) - Dh1n®0*\ +br{i) + b^{i) 



(66) 
(67) 



Under the Assumptions A.1-A.4, for fixed i + 1, the random family, {F {i + 1, x, '^)}xgK"Mxi, is J'^^i measurable, 
zero-mean and independent of Tf. Hence, the assumptions B.l, B.2 of Theorem |5] are satisfied. 

We now show the existence of a stochastic potential function V{-) satisfying the remaining Assumptions B.3-B.4 
of Theorem |5] To this end, define 



y (x) = (x - 1a, (g) 9*f [bL ® Im + Djj] (x - 1^ ® 9*) 



(68) 



Clearly, V (x) G C2 with bounded second order partial derivatives. It follows from the positive definiteness of 
\bL ® Im + Djj\ (Lemma |3]i, that 



v{1n®o*)^q, y (x) > 0, x7^ 1^® r 



(69) 



Since the matrix \bL ® Im + Djj\ is positive definite, the matrix \bL ® Im + Djj\ is also positive definite and 
hence, there exists a constant ci > 0, such that 



(x-ljv^r)' [bL(S)lM+Djj] (x-lA,0r)>ci||x-ljv®6'*f, Vx e 
It then follows that 

sup (i?(x),yx(x)) = -2 inf (-k-Xn ®0*f [bL®lM + Djj]^ {-k-In ®9*) 

||x-lN®e'||>£ ||x-lN89*||>e 

< -2 inf ci||x-lA,(g)rf 

||x-lN8e*||>e 

< -2cie^ 

< 



(70) 



(71) 



Thus, Assumption B.3 is satisfied. From eqn. 



||i?(x)||^ = {ii-lN®0*f [bL®lM + DjjY {ii~lN®0* 

= -i(i?(x),yx(x)) 

From eqn. ( |67] | and the independence assumptions (Assumption A.4) 



|F(z + l,x,o.)|l 



= E 



(x - Ijv ® 6* f (bL{i) ® /m)^ (x - In ® 6*) 



-E 



(72) 



fe^E ||T(i) + *(i) 
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Since the random matrix L{i) takes values in a finite set, there exists a constant C2 > 0, such that 

(x - Iat ® e*f (bL{i) ®Im)^ {^-1n® 0*) < C2||x - Iat r ||2 Vx e M^A./xi 
Again, since (bL (g) Im + is positive definite, there exists a constant cs > 0, such that 

(x - Iat ® r)^ /a/ + D^r] (x - Iat ® r) > callx - Ijv ® r f Vx e R^*^xi 
We then have from eqns. ( I73I74I I 



(73) 



(74) 



E 



for 



(x - Ijv ® 6**)^ (6L(i) ® /Af)^ (x - Ijv (K) r) 



< 



C2 



(x - Iat ® 6'*)'^ \bL (g) /a/ + DttI (x - 1a, (g) 6**) 

C3 

= C4F(x) 



some constant ca = — > 0. The term E 

C3 



a finite constant C5 > 0, as it follows from Assumptions A.1-A.4. We then 



(75) 
is bounded by 



i?(x)||2+E ||r(i + l,x,w)|| 



have from eqns. (I72I73I I 

< -i(i?(x),-^,(x))+C4l/(x)+C5 

< C6(l + y(x))-i(i?(x),Vx(x)) (76) 

where cg = max (04,05) > 0. This verifies Assumption B.4 of Theorem |5] Also, Assumption B.5 is satisfied by 
the choice of {a(i)}i>o (Assumption A.3.) It then follows that the 
In other words, 

P[lim x„(i) ^e*, Vn] = 1 

i — ^00 

which establishes the consistency of the £U algorithm. 



process {x(2)}j>g converges a.s. to In (g 9*. 

(77) 



C. Asymptotic Variance: CIA 

In this subsection, we carry out a convergence rate analysis of the CU algorithm by studying its moderate 
deviation characteristics. We summarize here some definitions and terminology from the statistical literature, used 
to characterize the performance of sequential estimation procedures (see [35]). 

Definition 7 (Asymptotic Normality) A sequence of estimates {x*(i)}^^p is asymptotically normal if for every 
9* e U, there exists a positive semidefinite matrix S{9*) e M^^^^^, such that, 

lim \/^(x•(^)-r) =^AA(OA./,S'(r)) (78) 

i—*oo 

The matrix S{d^) is called the asymptotic variance of the estimate sequence {x*(z)}j>q. 
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In the following we prove the asymptotic normaUty of the HJ algorithm and explicitly characterize the resulting 
asymptotic variance. To this end, define 



(79) 



Let Amin {bL ® Im + Djj-), be the smallest eigenvalue of [bL (g) Im + Djy] and recall the definitions of S(^, Sq 
(eqns. (l2TTT8T l). 

We now state the main result of this subsection, establishing the asymptotic normality of the CU algorithm. 

Theorem 8 (CU: Asymptotic normality and asymptotic efficiency) Consider the CU algorithm under A.1-A.4 with 
link weight sequence, {a(i)}i>o given by: 



a{i) = - 



2+1' 



(80) 

for some constant a > 0. Let {x(i)}->g be the state sequence generated. Then, if a > — [bZtgii +d—) ' ^^^^ 

y(i)(x(i)-iAr®r) =^ AA(o,5(r)) (81) 

where 

S{9*) 



a- I e"^" Soe"^" dv 



1 



[bL (g) Im + Djy] + 



(82) 

(83) 
(84) 

(85) 



5*0 = SH + DjjScDTT + b'Sq 
In particular, at any sensor n, the estimate sequence, {x„(i)}^>Q is asymptotically normal: 

V^Z) (X„(i) - r) =^ AA(0, Snn{e*)) 
where, S'„„(6'*) e M^^^a^ denotes the n-th principal block of S{9*). 

Proof: The proof involves a step-by-step verification of Assumptions C.1-C.5 of Theorem |5] since the 
Assumptions B.1-B.5 are already shown to be satisfied (see. Theorem |6]) We recall the definitions of i? (x) and 
r {i + 1, X, cj) from Theorem |6] (eqns. ( I66I67| |) and reproduce here for convenience: 



i? (x) = - [bL (g) Im + Dh] (x - 1^ e*) 
r(i + l,x,u;) = - \b (Z{i)(g)lM)x-(DHz{i)~DHlN<E)0*)+br{i) + b^{i) 
From eqn. (|86] |. Assumption C.l of Theorem |5] is satisfied with 

B = - [bL(g>lM+DH] 



(86) 
(87) 

(88) 



2A„i„ bL^/Af+Cy 



and 6 (x) = 0. Assumption C.2 is satisfied by hypothesis, while the condition a > 

T, = -a [bL (g) Im + Dh] + ^Inm = aB + ^Inm 
is stable, and hence Assumption C.3. To verify Assumption C.4, we have from Assumption A.4 

A(i,x) = E |^r(i + l,x,(j)r^ (i + l,x,£ 



implies 



(89) 



^L(i) O Im) xx^ (^L(i) ® Jm) J + E [{DHz{i) - Djj-In ® 61*) {Djjzii) - Z^liv » 6i*)^j 
+fe''E [(Y(i) + (Y(i) + (90) 



From the i.i.d. assumptions, we note that all the three terms on the R.H.S. of eqn. ( I90b are independent of i, and, 
in particular, the last two terms are constants. For the first term, we note that 



lim E 







(^L{i) (g) /Af) xx^ (^L{i) «) hi [ 
from the bounded convergence theorem, as the entries of \ L(i) > are bounded and 

(Z(i)(8)/Af) (Iat® r) = 

For the second term on the R.H.S. of eqn. (|90] i, we have 



(91) 



(92) 



E 



N ' 



E 



/ 



D 



H 



V 



/ 






] 


T- 












1n0* 




+ E 




V 




HNii) _ 


) 









(93) 



where the last step follows from eqns. (I79I211 I. Finally, we note the third term on the R.H.S. of eqn. (l90l l is b^Sq 
(see eqn. (fTsll.) We thus have from eqns. ( 190191193b 



lim A{i,x) = SH^DjjSc_D^ + b^Sq 

i — >OQ, X— *-X* 

— So 



(94) 
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We now verify Assumption C.5. Consider a fixed e > 0. We note that eqn. dSSl ) is a restatement of the uniform 



integrabihty of the random family, {||r (i + l,x, a;) |p} 



i>0, ||x-e'||<e 



|r {i + 1, X, u) 



b {L{i) ® Im ] X - {Dnzii) - DhI 



N 



From eqn. dSTl l we have 



6T(i) + 6*(i) 



< 9 



b ® Im) (x - r) - (Dnzii) - DhIn ® 0*) + 6T(^) + b^{i) 

{bL{i)®iM) (x-r) 



(95) 



Djj-z,(i) - DjjIn ®9*\\ + b^ ||T(^) + 

, for vectors yi, y2, ys- From eqn. ( l73b 



yif + ||y2f + ||y3f 



where we used the inequality, ||yi+y2+y3|P < 9 
we note that, if ||x — 0*\\ < e, 

(bL{i)(^lM) (x-r) ^ < C2e^ 

From ( [95] l, the family <r(i + l,x, w)l dominates the family |||r (i + l,x, IP) 

L J i>o, ||x-e'||<€ ' 



(96) 



j>0, ||x-e*||<e' 



where 



r(i + l,x,cj) 9 C2e2 + 1115^2(1) -Dtj^Iat® 0*11 + 5^ ||T(z) + 



(97) 



It is clear that the family \ t (i + l,x.,uj)> is i.i.d. and hence uniformly integrable (see [37]). Then 

I- J i>o, ||x-e*||<£ 

the family {||r(i + l,x, a;) |P}j>Q ||x-9*||<£ ^1^° uniformly integrable since it is dominated by the uniformly 
integrable family f (i + 1, x, w) > (see [37]). Thus the Assumptions C.1-C.5 are verified and the 

>. J j>0, ||x-e* ||<e 

theorem follows. ■ 



D. An Example 

From Theorem [8] and eqn. ( |79] ), we note that the asymptotic variance is independent of 9*, if the observation 
matrices are non-random. In that case, it is possible to optimize (minimize) the asymptotic variance over the weights 
a and b. In the following, we study a special case permitting explicit computations and that leads to interesting 
results. Consider a scalar parameter (M = 1) and let each sensor n have the same i.i.d. observation model, 



Z„(i) = he* +Cn{i) 



(98) 



where h ^ and {Cn(i)}i>o. i<n<Ar is a family of independent zero mean Gaussian random variables with variance 
a^. In addition, assume unquantized inter-sensor exchanges. We define the average asymptotic variance per sensor 
attained by the algorithm £U as 

Scu^^TriS) (99) 
where S is given by eqn. ( l82b in Theorem [8] From Theorem |8] we have 5*0 = a^h^I^ and hence from eqn. 



Scu 



N 



-Tr 



N 



Tr fe^^") dv 



(100) 
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From eqn. ( l83i l the eigenvalues of 2Y,v are [-2ab\n{L) - {2ah'^ - l)] w for 1 < n < iV and we have 



^ ^ in 



n=l 



N 



E 



2abXn{L) + (2a/i2 - 1) 



y ^ (101) 



N (2a/i2 - 1) N 2abXn{L) + {2ah^ - 1) 



In this case, the constraint a > — , — ^ , — — in Theorem [8] reduces to a > and hence the problem of 

optimum a, b design to minimize Scu is given by 

S*cu= inf Scu (102) 

It is to be noted, that the first term on the last step of eqn. ( llOll i is minimized at a = -p- and the second term 
(always non-negative under the constraint) goes to zero as ^ 00 for any fixed a > 0. Hence, we have 

Slu - ^ (103) 

The above shows that by setting a ^ and b sufficiently large in the £U algorithm, one can make Scu arbitrarily 
close to S'^ij. 

We compare this optimum achievable asymptotic variance per sensor, S'^n, attained by the distributed £U 
algorithm to that attained by a centralized scheme. In the centralized scheme, there is a central estimator, which 
receives measurements from all the sensors and computes an estimate based on all measurements. In this case, the 
sample mean estimator is an efficient estimator (in the sense of Cramer-Rao) and the estimate sequence {xc{i)}i>a 
is given by 

^cii) = J-j^J2zn{z) (104) 

n,i 

and we have 

Vi{xcii)-e*) ~ (0,5J (105) 
where, Sc is the variance (which is also the one-step Fisher information in this case, see, [35]) and is given by 

= ^ (106) 

From eqn. ( |103l l we note that, 

S*cu = (107) 

Thus the average asymptotic variance attainable by the distributed algorithm CU is the same as that of the optimum 
(in the sense of Cramer-Rao) centralized estimator having access to all information simultaneously. This is an 
interesting result, as it holds irrespective of the network topology. In particular, however sparse the inter-sensor 
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communication graph is, the optimum achievable asymptotic variance is the same as that of the centralized efficient 
estimator. Note that weak convergence itself is a limiting result, and, hence, the rate of convergence in eqn. (ISTI) 
in Theorem |8] will, in general, depend on the network topology. 

III. Nonlinear Observation Models: Agorithm AfU 

The previous section developed the algorithm CU for distributed parameter estimation when the observation model 
is linear In this section, we extend the previous development to accommodate more general classes of nonlinear 
observation models. We comment briefly on the organization of this section. In Subsection IIII-AI we introduce 
notation and setup the problem, and in Subsection IIII-BI we present the MU algorithm for distributed parameter 
estimation for nonlinear observation model and establish conditions for its consistency. 

A. Problem Formulation-Nonlinear Case 

We start by formally stating the observation and communication assumptions for the generic case. 
D.l)Nonlinear Observation Model: Similar to Sectionim let 6* <eU d M*^^^ be the true but unknown parameter 
value. In the general case, we assume that the observation model at each sensor n consists of an i.i.d. sequence 

{z„(i)},>o in MA^"><i with 

P£,-[z„(i) e P] = / dFe>, VPeB*^"''^ (108) 
Jv 

where Fq* denotes the distribution function of the random vector z„(i). We assume that the distributed observation 
model is separably estimable, a notion which we introduce now. 

Definition 9 (Separably Estimable) Let {zji(j)}i>o i.'i.d. observation sequence at sensor n, where 1 < n < 

A^. We call the parameter estimation problem to be separably estimable, if there exist functions .g„(-) : M*^" i — > 
^Mxi_ VI < n < AT, such that the function h{-) : i — > M^^^i given by 

1 ^ 

h{e)^-Y.^0'^9n{^r.m (109) 

n=l 

is invertiblj^ 

We will see that this condition is, in fact, necessary and sufficient to guarantee the existence of consistent distributed 
estimation procedures. This condition is a natural generalization of the observability constraint of Assumption A.2 
in the linear model. Indeed, if, assuming the linear model, we define gn{S) = H^^O, Vl < n < N in eqn. (I109l l. 
we have h{d) = GO, where G is defined in eqn. (|23] |. Then, invertibility of ( |109t is equivalent to Assumption A.2, 
i.e., to invertibility of G; hence, the linear model is an example of a separably estimable problem. Note that, if 
an observation model is separably estimable, then the choice of functions is not unique. Indeed, given a 

separably estimable model, it is important to figure out an appropriate decomposition, as in eqn. ( |109t , because 
the convergence properties of the algorithms to be studied are intimately related to the behavior of these functions. 

'The factor in eqn. )109) is just for notational convenience, as will be seen later. 
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At a particular iteration i, we do not require the observations across different sensors to be independent. In other 
words, we allow spatial correlation, but require temporal independence. 

D.2)Randoin Link Failure, Quantized Communication. The random link failure model is the model given in 
Section II-Bt similarly, we assume quantized inter-sensor communication with subtractive dithering. 
D.3)Independence and Moment Assumptions. The sequences {i(j)}i>o'{zri(i)}i<ri<A', i>0' 

{v'^iii)} (the dither 

sequence, as in eqn. Ill- Al l are mutually independent. Define the functions, : M*^^^ i — > M'*^^^, by 



hn{e) = Ee [g„(z„(i))] , VI < n < iV 



(110) 



We make the assumption: 



Ee 



1 ^ 

-^g„(z„(z))-M^) 



ri{e) < oo, yecu 



(111) 



In Subsection IIII-BI and Section IIVI we give two algorithms, MU and MUI, respectively, for the distributed 
estimation problem D1-D3 and provide conditions for consistency and other properties of the estimates. 



B. Algorithm MU 

In this subsection, we present the algorithm MU for distributed parameter estimation in separably estimable 
models under Assumptions D.1-D.3. 

Algorithm MU. Each sensor n performs the following estimate update: 



x„(i + 1) = x„(i) - a{i) 



^ (x„(i) - q(x/(i) + Vni(i))) + ft,„(x„(i)) - 5„(z„(i)) 

iGn„(i) 



(112) 



based on x„(i), {q(xi(i) + z^„i(i))};gj2 {iy ^"d z„(i), which are all available to it at time i. The sequence, 
{x„(i) e R^^^^l^^p, is the estimate (state) sequence generated at sensor n. The weight sequence {a(«)}j>g satisfies 
the persistence condition of Assumption A.3 and /3 > is chosen to be an appropriate constant. Similar to eqn. (fT2] l 
the above update can be written in compact form as 



x(z + 1) = x(^) - a(^) ® ImMi) + M{x{i)) - J(z(^)) + T(^) + 



(113) 



where T(i), '4'(i) are as in eqns. ( ll3H16l l and x(i) — [xf (i) • • • x^(i)]-^ is the vector of sensor states (estimates.) 
The functions A/(x(i)) and J(z(i)) are given by 



(114) 



We note that the update scheme in eqn. ( II 13b is nonlinear and hence convergence properties can only be character- 
ized, in general, through the existence of appropriate stochastic Lyapunov functions. In particular, if we can show 
that the iterative scheme in eqn. dl 13b falls under the purview of a general result like Theorem |5] we can establish 
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properties like consistency, normality etc. To this end, we note, that eqn. (II 13b can be written as 




which becomes in the notation of Theorem |5] 



x{i + 1) = + a{i) [i?(x(i)) + r (i + 1, x(i), w)] 



(116) 



where 



i?(x) 



[P {L (E) Im) (x - Ijv «) e*) + (M (x) - M{1n (E) 9*))] 



(117) 



and 




(118) 



Consider the filtration, {Ti}, 



x(0),{l(j),{z„(j)}i<^, T(j),*(j)} 




(119) 



Clearly, under Assumptions D.1-D.3, the state sequence, {x(z)}-j.q generated by algorithm MU is Markov w.r.t. 
{•^i}i>0' '■^^ definition in eqn. dl 18l l renders the random family, {F (i + 1, x, a;)}^gjjivMxi, J'i+i measurable, 
zero-mean, and independent of Ti for fixed i + 1. Thus Assumptions B.l, B.2 of Theorem |5] are satisfied, and we 
have the following immediately. 

Proposition 10 (JVU: Consistency and asymptotic normality) Consider the state sequence {x(i)}j>Q generated by 
the J\fU algorithm. Let R (x) ,T {i + 1 , x, ut) , !Fi he defined as in eqns. ( II 1711 1811 191 ). respectively. Then, if there 
exists a function V (x) satisfying Assumptions B.3, B.4 at x* = Iat ^6*, the estimate sequence {x„(i)}j>Q at any 
sensor n is consistent. In other words. 



If, in addition. Assumptions C.1-C.4 are satisfied, the estimate sequence {x„(«)}j>Q at any sensor n is asymptotically 
normal. 

Proposition [TO] states that, a.s. asymptotically, the network reaches consensus, and the estimates at each sensor 
converge to the true value of the parameter vector 6* . The Proposition relates these convergence properties of J\fU to 
the existence of suitable Lyapunov functions. For a particular observation model characterized by the corresponding 
functions /i„( ), (7„(-), if one can come up with an appropriate Lyapunov function satisfying the assumptions of 
Proposition [TOl then consistency (asymptotic normality) is guaranteed. Existence of a suitable Lyapunov condition 
is sufficient for consistency, but may not be necessary. In particular, there may be observation models for which the 



Pe-[lim x„(i) =0*, Vn] = 1 



(120) 
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AfU algorithm is consistent, but there exists no Lyapunov function satisfying the assumptions of Proposition [T^ 



Also, even if a suitable Lyapunov function exists, it may be difficult to guess its form, because there is no systematic 
(constructive) way of coming up with Lyapunov functions for generic models. 

However, for our problem of interest, some additional weak assumptions on the observation model, for example, 
Lipschitz continuity of the functions will guarantee the existence of suitable Lyapunov functions, thus 

establishing convergence properties of the MU algorithm. The rest of this subsection studies this issue and presents 
different sufficient conditions on the observation model, which guarantee that the assumptions of Proposition [10] 
are satisfied, leading to the a.s. convergence of the MU algorithm. We start with a definition. 

Definition 11 (Consensus Subspace) We define the consensus subspace, C C M^^^^i as 

C = {y e M^^^><i I y - Iat ® y, y e IR*^><i} (121) 

For y e M^*^^^, we denote its component in C by yc and its orthogonal component by y^. 

Theorem 12 (AfU: Consistency under Lipschitz on hn) Let {x(i)}j>Q be the state sequence generated by the JVU 
algorithm (Assumptions D.1-D.3.) Let the functions < n < A^, be Lipschitz continuous with constants 

kn > 0, 1 < n < N, respectively, i.e., 

\\hn{0)-K,{e)\\<K\\e-9l y 9,9eM^'''\ l<n<N (122) 



and satisfy 



Define K as 



(o-eY (^hn(e) ~ hn(e)^ > o, y e^ee i<n<N (123) 



K = ma.x{ki, ■ ■ ■ ,fcAr) (124) 

Then, for every (3 > 0, the estimate sequence is consistent. In other words, 

lim x„(i) = 6**, Vnj = 1 (125) 

Before proceeding with the proof, we note that the conditions in eqns. ( II 221 1231 ) are much easier to verify than the 
general problem of guessing the form of the Lyapunov function. Also, as will be shown in the proof, the conditions 
in Theorem [12] determine a Lyapunov function explicitly, which may be used to analyze properties like convergence 
rate. The Lipschitz assumption is quite common in the stochastic approximation literature, while the assumption 
in eqn. ( 11231 ) holds for a large class of functions. As a matter of fact, in the one-dimensional case (Af 1), it is 
satisfied if the functions /i„(-) are non-decreasing. 

Proof: As noted earlier, the Assumptions B.l, B.2 of Theorem|5]are always satisfied for the recursive scheme 
in eqn. ( 11131 ) To prove consistency, we need to verify Assumptions B.3, B.4 only. To this end, consider the following 

"^This is because converse theorems in stability theory do not always hold (see, [38].) 
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Lyapunov function 

y(x) = ||x- ljv®0*f (126) 

Clearly, 

V{1n®0*)=0, y (x) > 0, X 7^ In «)6I*, lim y (x) = oo (127) 
The assumptions in eqns. (11221123b imply that h{-) is Lipschitz continuous and 

Mxl 



0] [h{9)-h{e)) >o, ye^eew'"'' (i28) 



where eqn. (|128t follows from the invertibility of h{-) and the fact that, 

1 



h{e) ^ —hnie), yeer""' (129) 

Recall the definitions of R (x) , F (i + 1, x, w) in eqns. (II 1711 18l l respectively. We then have 

{R (x) , (x)) = -2/3 (x - Ijv ® 0*f (I ® /m) (x - Ijv 6**) - 2 (x - Iat ® r)"^ [M (x) - Af(lAr <E) 6*)] 

N 

= -2/3 (x - ijv ® r)^ (I ® Im) (x - ijv ® r ) - 2 ^ [(x„ - r )^ (/i„(x„) - /i„(r )) 

n=l 

< (130) 

where the last step follows from the positive-semidefiniteness of L^Im and eqn. (11231) . To verify Assumption B.3, 
we need to show 

sup (i?(x),yx(x)) <0, Ve>0 (131) 
£<||x-i„e*||<i 



Let us assume on the contrary that eqn. (I131l l is not satisfied. Then from eqn. (I130l l we must have 

sup (i?(x),yx(x)) ==0, Ve>0 (132) 

e<||x-lN6e||<i 

Then, there exists a sequence, {x''}^,^^ in |x e R^*^xi e < ||x- 1nS*\\ < -^j, such that 

lim (i?(x'=),\4(x'')) = (133) 

A;— >oo 

the set {x e R^Mxi | ^ < ||x- 1a,6I*|| < i} is relatively compact, the sequence {x'^^ has a limit point, 
X, such that, e<||x — lAr6'*||<i, and from the continuity of (i? (x) , (x)), we must have 

(i?(X),yx(X)) =0 (134) 

From eqns. ( I123I130I ). we then have 

(X - Ijv e*f (L®Im) (x - Ijv ® 9*) = 0, (x„ - e*f (/i„(x„) - K{e*)) = O, Vn (135) 
The first equality in eqn. ( 1135b and the properties of the Laplacian imply that x € C and hence there exists 
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a e M*^^\ such that, 

x„ = a, Vri (136) 
The second set of inequalities in eqn. ( |135t then imply 

(a-r)^(/i(a)-/i(r)) = (137) 

which is a contradiction by eqn. (1128b since a 7^ 0*. Thus, we have eqn. (I131I I that verifies Assumption B.3. Finally, 
we note that, 

||i?(x)||2 = ||/3(L®/M)(x-lAr(g)r) + (Af (x)-M(lA,(g)r))|f 

< 4/32 II {L®Im) (x - Ijv «) 6'*)f + 4 ||Af (x) - M{1n ® 9*)\\^ 

< AP^Xn(L)\\x-1n (^0*f + AK^\\x-1n (E)9*f (138) 

where the second step follows from the Lipschitz continuity of ) and K is defined in eqn. ( I124l l. To verify 
Assumption B.4, we have then along similar fines as in Theorem |6] 

\\R{^)f+E[\\r{i + l,^,uj)f] < fci(l + y(x)) 

< fci(l + l/(x))-(i?(x),y,,(x)) (139) 

for some constant ki > (the last step follows from eqn. ( I130I I.) Hence, the required assumptions are satisfied and 
the claim follows. 

■ 

It follows from the proof, that the Lipschitz continuity assumption in Theorem [12] can be replaced by continuity of 
the functions 1 < n < A^, and linear growth conditions, i.e., 

\\hn{0)f < cn,i + cn,2\m^ ^ R'' I < u < N (140) 

for constants c„^i, c„_2 > 0. 

We now present another set of sufficient conditions that guarantee consistency of the algorithm J\fh{. If the 
observation model is separably estimable, in some cases even if the underlying model is nonlinear, it may be 
possible to choose the functions, gn{-), such that the function h{-) possesses nice properties. This is the subject of 
the next result. 

Theorem 13 (MU: Consistency under strict monotonicity on h) Consider the AfU algorithm (Assumptions D.l- 
D.3.) Suppose that the functions g„(-) can be chosen, such that the functions /i„(-) are Lipschitz continuous 
with constants fc„ > and the function h{-) satisfies 

Mxl 



h{9) - h{0)) > 7||6i- 6i||^ y0,e£ R*''^^ (141) 
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for some constant 7 > 0. Then, if 3 > the algorithm AfU is consistent, i.e.. 



lim x„(i) — 0* , \fn 



= 1 



(142) 



where, K = max(/c 



1, ■ ■ 



Before proceeding to the proof, we comment that, in comparison to Theorem [12] strengthening the assumptions on 
h{-), see eqn. ( I141l i. considerably weakens the assumptions on the functions /i„( ). Eqn. ( I141l i is an analog of strict 
monotonicity. For example, if h{-) is linear, the left hand side of eqn. (1141b becomes a quadratic and the condition 
says that this quadratic is strictly away from zero, i.e., monotonically increasing with rate 7. 

Proof: As noted earlier, the Assumptions B.l, B.2 of Theorem|5]are always satisfied by the recursive scheme in 
eqn. (II 131 ) To prove consistency, we need to verify Assumptions B.3, B.4 only. To this end, consider the following 
Lyapunov function 

y(x) = ||x-ljv(g)6i*f (143) 



Clearly, 



V{1n®0*)=0, y (x) > 0, X ^ Ijv «)6I*, lim V (x) = cx) 

l|x[Hoo 



(144) 



Recall the definitions of i? (x) , F (i + 1, x, lj) in eqns. dl 1711 181 ). respectively, and the consensus subspace in 
eqn. ( 1121b . We then have 



{R (x) , Fx (x)) = -2/3 (x - Ijv ® e*f {L ® Im) (x - Ijv «) r ) - 2 (x - Iat e*f [M (x) - M(ljv <E) 0*)] 
< -2/3\2(L)\\xc± ||2 - 2 (x - Iat ® 6*f [M (x) - M(xc)] 



< 



< 



< 



^2/3A2(L)||xc. 



-2 (x - Iat ® 0*f [M(xc) - M{1n <E) 0*)] 



(x - Iat 0*y [M (x) - M(xc)] 



^2 (x - Iat ® 0*y [M(xc) - M{1n (8) 0*)] 



< -2/3A2(L)||xci||^ + 2if||xc-L||||x- Ijv®! 



-2/3A2(L)||xc. 



-2{x-lN(g>0*y [M(xc) - M{1n(E> 0*)] 
2if||xc-L||||x - Ijv «) r II - 2xJ_j_ [M(xc) - A/(lAr (g, 0*)] 



^2 (xc - Iat «) r)' [M(xc) - M{1n «) fil*)] 



-2/3A2(L)||xc. 



2if||xc-L||||x- Ijv 1811 



-2 (xc - Iat r)^ [Af(xc) - M(lAr r)] 



^2/3A2(L)||xc. 



2if||xr._L||||x- Iat (g) 



-27 ||xc - Iat (Ki I 



2||xJ_j_ [Af(xc)- Af(l7v®6'*)]| 



2i^||xc^||||xc- ljv®( 



-2f3X2{L) + 2K) \\xc± f + 4is:||xc^||||xc ~1n(E>0*\\- 27 ||xc ~ In ® 0* 



(145) 
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where the second to last step is justified because xc = y for some y e M*^^^ and 

N 

{^c-iN^e*f[M{^c)-M{iN^e*)] = Y,iy-s*f[hn{y)-hn{e*)] 

n=l 

N 

= (y-rf ^K(y)-/i„(r)] 

ri=l 

= Niy-e*f[h{y)-hi9*)] 
> iV7||y-r||' 

= 7||xc-lAr®r||' (146) 
It can be shown that, if /3 > ^ t^-'' ^ the term on the R.H.S. of eqn. ( 11451 ) is always non-positive. We thus have 

(i? (x) , Fx (x)) < 0, VxeM*^^^i (147) 

By the continuity of (i? (x) , Vx (x)) and the relative compactness of |x e jjWA^xi g < ||x — lAr6'*|| < i|, we 
can show along similar fines as in Tfieorem [TSl that 

sup (i?(x),V;,(x)) <0, Ve>0 (148) 

e<||x-lNe*||<i 

verifying Assumption B.3. Assumption B.4 can be verified in an exactly similar manner as in Theorem [12] and the 
result follows. ■ 

IV. Nonlinear Observation Models: Algorithm J^LU 
In this Section, we present the algorithm MCIA for distributed estimation in separably estimable observation 
models. As will be explained later, this is a mixed time-scale algorithm, where the consensus time-scale dominates 
the observation update time-scale as time progresses. The M CIA algorithm is based on the fact that, for separably 
estimable models, it suffices to know h{6*), because 9* can be unambiguously determined from the invertible 
function h{9*). To be precise, if the function h{-) has a continuous inverse, then any iterative scheme converging 
to h{9*) will lead to consistent estimates, obtained by inverting the sequence of iterates. The algorithm AfClA is 
shown to yield consistent and unbiased estimators at each sensor for any separably observable model, under the 
assumption that the function h{ ) has a continuous inverse. Thus, the algorithm M OA presents a more reliable 
alternative than the algorithm MU, because, as shown in Subsection IIII-BI the convergence properties of the latter 
can be guaranteed only under certain assumptions on the observation model. We briefly comment on the organization 
of this section. The M CIA algorithm for separably estimable observation models is presented in Subsection IIV-AI 
Subsection IIV-BI offers interpretations of the M CIA algorithm and presents the main results regarding consistency, 
mean-square convergence, asymptotic unbiasedness proved in the paper. In Subsection IIV-CI we prove the main 
results about the MClA algorithm and provide insights behind the analysis (in particular, why standard stochastic 
approximation results cannot be used directly to give its convergence properties.) Finally, Subsection |V] presents 
discussions on the M CIA algorithm and suggests future research dkections. 
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A. Algorithm AfCU 

Algorithm MCIA: Let x(0) — [x^ • • -x^]-^ be the initial set of states (estimates) at the sensors. The M OA 
generates the state sequence {x„(i)}->Q at the n-th sensor according to the following distributed recursive scheme: 



x„(z + l) = /i-M ft(x„(i))-/3(z) (/^(x„(^))-q(/^(x^(^)) + ^^„^(^))) I - a(i) (/i(x„(z)) - g„(z„(i))) 

(149) 

based on the information, x„(i), {q {h{xi{i)) + i'Tii(*))}ieo {%) i available to it at time i (we assume that at 

time i sensor I sends a quantized version of ft,(x;(i)) + J^„((i) to sensor n.) Here h^^{-) denotes the inverse of the 
function h{-) and {/3(j)}i>o ' {'^(*)}i>o appropriately chosen weight sequences. In the sequel, we analyze the 
MCU algorithm under the model Assumptions D.1-D.3, and in addition we assume: 
D.4): There exists ei > 0, such that the following moment exists: 

2+er 



k{9) < oo, ye eu 



(150) 



The above moment condition is stronger than the moment assumption required by the J\fU algorithm in eqn. ( Ill II ). 
where only existence of the quadratic moment was assumed. 
We also define 



J(z(^))-l(l^.®/Mf J(zW) 



W))-^(liv®/A/f J(zW) 



= ^1(6*) < 00, W€U 

= K2{9) < 00, we eu 



D.5): The weight sequences {/3(j)}i>o'{/?(*)}i>o given by 
where a, 6 > are constants. We assume the following: 

1 



• 5 < ri,T2 < 1, Tl > 



2 + ei 



T2, 2r2 > Tl 



(151) 



(152) 



(153) 



(154) 



We note that under Assumption D.4 that ei > 0, such weight sequences always exist. As an example, if = -49, 

then the choice t\ = \ and T2 = .505 satisfies the inequalities in eqn. ( 11541) . 

D.6): The function ) has a continuous inverse, denoted by /i^^(-) in the sequel. 
To write the MCU in a more compact form, we introduce the transformed state sequence, {x(i)}^j^p, where 
x(i) = [xf (i) • • • x^(i)]"^ G M^^^^i and the iterations are given by 

X(z + 1) = x(z) - /3(z) {L{{) ® Im) - ail) [x(i) - J(z(^))] - (T(i) + (155) 



{€)= {h-H^im ■■■ih-\^Nm' 



(156) 
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Here Y(i), ^(i) model the dithered quantization error effects as in algorithm MU. The update model in eqn. (11551 ) 
is a mixed time-scale procedure, where the consensus time-scale is determined by the weight sequence {/3(i)},>o- 
On the other hand, the observation update time-scale is governed by the weight sequence {a(«)}i>o- follows 



dominates the observation update time-scale as the algorithm progresses making it a mixed time-scale algorithm 
that does not directly fall under the purview of stochastic approximation results like Theorem |5] Also, the presence 
of the random link failures and quantization noise (which operate at the same time-scale as the consensus update) 
precludes standard approaches like time-scale separation for the limiting system. 

B. Algorithm M OA: Discussions and Main Results 

We comment on the M CIA algorithm. As is clear from eqns. ( I155I156I ), the MLU algorithm operates in a 
transformed domain. As a matter of fact, the function ) (c.f. definition |9]l can be viewed as an invertible 
transformation on the parameter space lA. The transformed state sequence, {x(i)}i>o, is then a transformation of 
the estimate sequence {x(i)}i>o, and, as seen from eqn. (1155b . the evolution of the sequence {x(i)},;>o is linear This 
is an important feature of the M OA algorithm, which is linear in the transformed domain, although the underlying 
observation model is nonlinear. Intuitively, this approach can be thought of as a distributed stochastic version of 
homomorphic filtering (see [39]), where, by suitably transforming the state space, linear filtering is performed on 
a certain non-linear problem of filtering. In our case, for models of the separably estimable type, the function ) 
then plays the role of the analogous transformation in homomorphic filtering, and in this transformed space, one can 
design hnear estimation algorithms with desirable properties. This makes the M OA algorithm significantly different 
from algorithm MlA, with the latter operating on the untransformed space and is non-linear. This linear property 
of the M OA algorithm in the transformed domain leads to nice statistical properties (for example, consistency 
asymptotic unbiasedness) under much weaker assumptions on the observation model as required by the nonlinear 
MOA algorithm. 

We now state the main results about the M OA algorithm, to be developed in the paper. We show that, if the 
observation model is separably estimable, then, in the transformed domain, the M OA algorithm is consistent. More 
specifically, if d* is the true (but unknown) parameter value, then the transformed sequence {x(i)}i>o converges 
a.s. and in mean-squared sense to h{d*). We note that, unlike the J\fU algorithm, this only requires the observation 
model to be separably estimable and no other conditions on the functions /i„(-), /i(-). We summarize these in the 
following theorem. 

Theorem 14 Consider the M OA algorithm under the Assumptions D.1-D.5, and the sequence {x(i)}^^g generated 
according to eqn. (I155l l. We then have 



from Assumption D.5 that ti > r2, which in turn implies, ^fcy 



oo as I 



(X). Thus, the consensus time-scale 



Pe- lim S„(i) = h{B*), VI < n < iV = 1 



(157) 



lim Eg- [||S„(i) - /i(r)f 1 = 0, Vl<n<7V 



(158) 
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In particular, 

lim Eg. [x„(i)] = h{e*), VI < 71 < TV (159) 

i — ^oo 

In other words, in the transformed domain, the estimate sequence {x„(i)}i>o at sensor n, is consistent, asymptot- 
ically unbiased and converges in mean-squared sense to h(9*). 

As an immediate consequence of Theorem [141 we have the following result, which characterizes the statistical 
properties of the untransformed state sequence {x(i)}i>o. 

Theorem 15 Consider the MCU algorithm under the Assumptions D.1-D.6. Let {x(i)}^>p be the state sequence 
generated, as given by eqns. ( |155|156l l. We then have 

Pe- [ lim x„(i) =6**, V 1 < n < ivj =1 (160) 

In other words, the MCU algorithm is consistent. 

If in addition, the function ) is Lipschitz continuous, the MCU algorithm is asymptotically unbiased, i.e., 

lim Eg. [x„(i)] = r, V 1 < n < iV (161) 

i — >oo 

The next subsection is concerned with the proofs of Theorems [141 [El 

C. Consistency and Asymptotic Unbiasedness of MCU: Proofs of Theorems I74]77l 

The present subsection is devoted to proving the consistency and unbiasedness of the MCU algorithm under the 
stated Assumptions. The proof is lengthy and we start by explaining why standard stochastic approximation results 
like Theorem|5ldo not apply directly. A careful inspection shows that there are essentially two different time-scales 
embedded in eqn. ( 11551 ). The consensus time-scale is determined by the weight sequence {/3(i)}j>0' whereas the 
observation update time-scale is governed by the weight sequence {a(i)}j>Q. It follows from Assumption D.5 that 
Ti > T2, which, in turn, implies — > oo as i ^ oo. Thus, the consensus time-scale dominates the observation 
update time-scale as the algorithm progresses making it a mixed time-scale algorithm that does not directly fall under 
the purview of stochastic approximation results like Theorem |5l Also, the presence of the random link failures and 
quantization noise (which operate at the same time-scale as the consensus update) precludes standard approaches 
like time-scale separation for the limiting system. 

Finally, we note that standard stochastic approximation assume that the state evolution follows a stable determin- 
istic system perturbed by zero-mean stochastic noise. More specifically, if {y(i)}i>o is the sequence of interest. 
Theorem |5l assumes that {y(i)}i>o evolves as 

y{i + 1) = y(z) + 7(0 [i?(y(*)) + y{i + 1, c^, y(i))] (162) 

where {7(«)}i>o is the weight sequence, T{i + l,cj,y(i)) is the zero-mean noise. If the sequence {y(i)}i>o is 
supposed to converge to yg, it further assumes that i?(yo) = and yo is a stable equilibrium of the deterministic 
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system 

y<i(« + i) = y<iW + 7W^(y<iW) (163) 

The MU algorithm (and its Unear version, CU) falls under the purview of this, and we can establish convergence 
properties using standard stochastic approximation (see Sections IIIIIII-AI ) However, the M OA algorithm cannot be 
represented in the form of eqn. (1162b . even ignoring the presence of multiple time-scales. Indeed, as established by 
Theorem [T4l the sequence {x(j)}i>o is supposed to converge to Ijv (8) /i(6'*) a.s. and hence writing eqn. (1155b as 
a stochastically perturbed system around Xjq ® h{d*) we have 

S(z + 1) = x(«) + 7(z) [-R(S(z)) + r{i + 1, uj, X(i))] (164) 

where, 

R{Sc{i)) = (L®Im) (X(^) -In® h{e*)) - a{{) ~1n® h{0*)) (165) 

and 

r{i + l,uj,x{i)) = -/3(i) (lii) ® Im) -In® h{e*)) - (3{i) (T(i) + + a{i) {J{z{i)) -In® h{e*)) 

(166) 

Although, R{1n ® h(0*)) = in the above decomposition, the noise T{i + 1, w,x(i)) is not unbiased as the term 
{J{z{i)) — In ® h{9*)) is not zero-mean. 

With the above discussion in mind, we proceed to the proof of Theorems I14I15I which we develop in stages. 
The detailed proofs of the intermediate results are provided in the Appendix. 

In parallel to the evolution of the state sequence {x(i)}j>Q, we consider the following update of the auxiliary 
sequence, {x°(i)}->Q: 

+ 1) = - Pii) {L®Im) - a{{) [x°(i) - J(z(z))] (167) 

with x°(0) = x(0). Note that in ( 1167b the random Laplacian L is replaced by the average Laplacian L and the 
quantization noises T(i) and ^{i) are not included. In other words, in the absence of link failures and quantization, 
the recursion (1155b reduces to (1167b . i.e., the sequences {x(i)}j>g and {x°(i)}j>Q are the same. 

Now consider the sequence whose recursion adds as input to the recursion in ( 1167b the quantization noises T{i) 
and In other words, in the absence of link failures, but with quantization included, define similarly the 

sequence {x(?;)}.j.q given by 

x(2 + 1) = x(i) - (3{i) (L (g) Im) - a{i) - J{z{i))] - /3{i) ( Y(i) + (168) 

with x(0) = x(0). Like before, the recursions (11551156b will reduce to (1168b when there are no link failures. 
However, notice that in (|168b the quantization noise sequences T(i) and ^{i) are the sequences resulting from 
quantizing x(j) in ( |155b and not from quantizing x(i) in ( |168b . 
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Define the instantaneous averages over the network as 



Xavg(i) 
Xavg(«) 



1 ^ ^ 



1 T 

— {In «) /a/) x(i) 



1 T~ 

— (Itv^/m) x(i) 



(169) 



,(0 



(0 



1 ^ 

n=l 



1 



AT 
n=l 



(170) 



We sketch the main steps of the proof here. While proving consistency and mean-squared sense convergence, 
we first show that the average sequence, {x°yg(i)} .^p, converges a.s. to h{9*). This can be done by invoking 
standard stochastic approximation arguments. Then we show that the sequence {x°(i)}j>Q reaches consensus a.s., 
and clearly the limiting consensus value must be h{9*). Intuitively, the a.s. consensus comes from the fact that, 
after a sufficiently large number of iterations, the consensus effect dominates over the observation update effect, 
thus asymptotically leading to consensus. The final step in the proof uses a series of comparison arguments to show 
that the sequence {x(i)}j>Q also reaches consensus a.s. with h{6*) as the limiting consensus value. 
We now detail the proofs of Theorems 1 1 41 1 5 1 in the following steps. 

I: The first step consists of studying the convergence properties of the sequence {^avg(*)}j>Q (^^^ ^l^^- < I167I) ). 

for which we establish the following result. 



Lemma 16 Consider the sequence, {x°(i)}j>Q, given by eqn. ( 11671) . under the Assumptions D.1-D.5. Then, 



lim Ee 

i — ^oo 



lim x°(i) = lN®h{e*) 
\Si°{i)-lN®hie*)\f 



= 1 

= 



(171) 
(172) 



Lemma [T6l says that the sequence {x°(i)},^^Q converges a.s. and in /I2 to l7v^^(^'^)- For proving Lemma 1161 we 
first consider the corresponding average sequence {x°^,g(i)}i>o (see eqn. ( 1170b ). For the sequence {x°^,„(i)}i>o, 
we can invoke stochastic approximation algorithms to prove that it converges a.s. and in £2 to h{9*). This is 
carried out in Lemma [TtI which we state now. 

Lemma 17 Consider the sequence, {x°yg(i)} given by eqn. ( 1170b . under the Assumptions D.1-D.5. Then, 



lim Eg* 

i — >oo 



lim x° (i) = h{e*) 
KJ^) - h{0*)\\' 



(173) 
(174) 



In Lemma [T6l we show that the sequence {x°(i)}j>Q reaches consensus a.s. and in £2, which together with 
Lemma [TT] establishes the claim in Lemma [T6l (see Appendix Ull for detailed proofs of Lemmas I17I16I ) 
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The arguments in Lemmas I17I16I and subsequent results require the following property of real number sequences, 
which we state here (see Appendix U for proof.) 



Lemma 18 Let the sequences {?'i(*)}i>o {^2(*)}i>o t>e given by 



ri{i) 



r2(i) 



(175) 



where ai, 02,(^2 > and < (5i < 1. Then, if Si = S2, there exists B > 0, such that, for sufficiently large 
non-negative integers, j < i, 

i-l r / i-l \ 

" " <B (176) 



k=j 



n (l-ri(Z)) r2(fc) 



Moreover, the constant B can be chosen independently of i, j. Also, if Si < S2, then, for arbitrary fixed j, 



lim y 



n il-riil))]r2{k) 



k=j L \l=k+l 



= 



(177) 



(We use the convention that, H/^fc+i (1 ~ — 1' fo'" fc = « — 1-) 

We note that Lemma [I8] essentially studies stability of time-varying deterministic scalar recursions of the form: 



y{i + 1) = ri{i)y{i) + r2{i) 



(178) 



where {y(«)}i>o is a scalar sequence evolving according to eqn. ( 11781 ) with j/(0) = 0, and the sequences {ri (i)}j>o 
and {7'2(«)}j>o given by eqn. ( 11751 ). 



11: In this step, we study the convergence properties of the sequence {x(z)}.>p (see eqn. ( II68I 1). for which we 
establish the following result. 



Lemma 19 Consider the sequence {x(i)}j>Q given by eqn. ( 1168b under the Assumptions D.1-D.5. We have 



lim Eg* 

i — >oo 



lim x(i) = Iat (g) h{6*) 
\Scii)-lN®h{e*)f 



= 1 
= 



(179) 
(180) 



The proof of Lemma [19] is given in Appendix [nil and mainly consists of a comparison argument involving the 
sequences {Savg(«)},>o {x(i)},^ 



ri>o- 



HI: This is the final step in the proofs of Theorems I14I15I The proof of Theorem [14] consists of a comparison 
argument between the sequences {x(i)}j>Q and {x(i)}j>g, which is detailed in Appendix [iVj The proof of 
Theorem [15] also detailed in Appendix IIVI is a consequence of Theorem [14] and the Assumptions. 
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V. Conclusion 

This paper studies linear and nonlinear distributed (vector) parameter estimation problems as may arise in 
constrained sensor networks. Our problem statement is quite general, including communication among sensors that 
is quantized, noisy, and with channels that fail at randonm times. These are characteristic of packet communication 
in wireless sensor networks. We introduce a generic observability condition, the separable estimability condition, 
that generalizes to distributed estimation the general observability condition of centraUzed parameter estimation. 
We study three recursive distributed estimators, ACU, NU, and M OA. We study their asymptotic properties, 
namely: consistency, asymptotic unbiasedness, and for the ACIA and MU algorithms their asymptotic normahty. The 
NCU works in a transformed domain where the recursion is actually linear, and a final nonlinear transformation, 
justified by the separable estimability condition, recovers the parameter estimate (a stochastic generalization of 
homeomorphic filtering.) For example. Theorem [14] shows that, in the transformed domain, the M £U leads to 
consistent and asymptotically unbiased estimators at every sensor for all separably estimable observation models. 
Since, the function ) is invertible, for practical purposes, a knowledge of h(d*) is sufficient for knowing Q* . In that 
respect, the algorithm M OA is much more applicable than the algorithm MlA, which requires further assumptions 
on the observation model for the existence of consistent and asymptotically unbiased estimators. However, in case, 
the algorithm MlA is applicable, it provides convergence rate guarantees (for example, asymptotic normality) which 
follow from standard stochastic approximation theory. On the other hand, the algorithm M OA does not follow under 
the purview of standard stochastic approximation theory (see Subsection IIV-CI ) and hence does not inherit these 
convergence rate properties. In this paper, we presented a convergence theory (a.s. and £2) of the three algorithms 
under broad conditions. An interesting future research direction is to establish a convergence rate theory for the 
M OA algorithm (and in general, distributed stochastic algorithms of this form, which involve mixed time-scale 
behavior and biased perturbations.) 

Appendix I 
Proof of Lemma [Ts] 

Proof: [Proof of Lemma [TS] We prove for the case 8\ < \ first. Consider j sufficiently large, such that, 

ri(z)<l, Vi>i (181) 
Then, for fc > j, using the inequality, 1 — a < e^'^, for < a < 1, we have 

i-\ 
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where Ci — C\{j) > for sufficiently large j. From eqns. (11851186b we have 



E 

k=3 



n il-n{l))]r,{i) 



026 



(fc+1)^ 



< 



dt 



(i + l)* 



J+2 



T^*'"'l 1 
t*2 



< 



2*^a2 


2 '02 Jj-+2 


T^t'-'l 1 ■ 
gl-5l ^_ 


dt 




ai Jj+2 


el--*! 


-^1 1 


dt + Ci 



(187) 



It is clear that the second term stays bounded if 6i — §2 and goes to zero as i ^ oo if (5i < 82, thus establishing 
the Lemma for the case 5i < 1. Also, in the case 5i = 62, we have from eqn. ( |187t 



E 



n (l-ri(O) r2(z) 



< 



2-^^02 



2'*^a2 



ai + Ci 



'i+2 



T^*'"''i 1 



< 2^^a2 + ^ 



thus making the choice of B in eqn. ( 1176b independent of j, j. 
Now consider the case 5i = 1. Consider j sufficiently large, such that. 



dt 



(188) 



ri(i) < 1, yi > j 



(189) 



Using a similar set of manipulations for k > j, we have 



We thus have 



l=k+l 



(fc + 2)°i 



i-l 

E 



n (l-ri(O) r2(z) 



dt 



< 



< 



a2 (fc + 2)° 



E' 



(i + 1)'^! ^ (fc + 

2^^Q2 (fc + 2)"i 

k—j 



^ ^ fc=j+2 



(190) 



(191) 
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Now, if fli > 62, then 



E 



\l=k+l ) 



< 



+ 



7^ H k^'- 



k=j+2 



k=]+2 



< 



(i + l)'^! 

[(z + 1)''-^= + int^e^-'^dt] 



(192) 



It is clear that the second term remains bounded if = 1 and goes to zero if 82 > 1- The case ai < 62 can be 
resolved similarly, which completes the proof. 



Appendix II 
Proofs of LEMMAs fniiei 

Proof: [Proof of Lemma [TtI It follows from eqns. (11671170b and the fact that 

(1jv0/m)'^ (L®Im) =0 
that the evolution of the sequence, {xavg(*)}j>g is given by 



1 ^ 



Xavg(i + 1) = x°vg(0 - 
We note that eqn. ( |194t can be written as 

+ 1) = x:,g(z) + a(z) [i?(x:,g(z)) + r(z + 

where 



TV 



R{y) = -{y^h{e*)), r(z + l,y,c.) = l^g„(z„W-Mr), yeM^^xi 

ri=l 

Such a definition of i?( ), r( ) clearly satisfies Assumptions B.1,B.2 of Theorem|5] Now, defining 

V{y) = \\y-h{e*)r 



we have 



V{h{e*)=Q, V{y)>0, y^h{e*), lim V{y) = ^ 

llylHoc 



(193) 



(194) 



(195) 



(196) 



(197) 



(198) 
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Also, we have for e > 



sup (i?(y),l/y(y)) 

£<||y-/i(e*)||<i 



sup {~2\\y - hi9*)\\') 

e<\\y-hie')\\<^ 



< -2e^ 

< 



(199) 



thus verifying Assumption B.3. Finally from eqns. (Ill 111961 ) we have 



\\R{y)f+Ee, \\Tii + l,y,u)\ 



< fci(l + y(y)) 

< h{l + V{y))-{R{y),Vy{y)) (200) 

for ki — max(l, 77(6**)). Thus the Assumptions B.1-B.4 are satisfied, and we have the claim in eqn. ( 1173b . 
To establish eqn. ( I174l l. we note that, for sufficiently large i, 

Xa°vg(*) - HnW''] = (1 - - - 1) - /i(r)f ] + - i)77(r) 

< (1 - ail - l))Eg, - 1) - /i(0*)f ] + a^{i - IHO*) (201) 

where the last step follows from the fact that < (1 — «(«)) !i 1 for sufficiently large i. Continuing the recursion 
in eqn. ( 12011 ). we have for sufficiently large j < i 



ifl- [||5avg(»)-Me*)ir] < \l{(^-aik))]\K,io)-h{e'')\\' + v{0*)J2 



n (l-a(O) U'(fc) 



5avg(o)-Me*)||' + 77(r)^ 



l = k + l 

1 



n {l-a{l))]a'{k) 

l = k + l 



(202) 



From Assumption D.5, we note that J2k=j Q^(fc) — > 00 as i — > od because 0.5 < n < 1. Thus, the first term 



in eqn. (12021 ) goes to zero as i ^ 00. The second term in eqn. (12021 ) falls under the purview of Lemma [TS] with 
5i = Ti and 62 = 2ri and hence goes to zero as i ^ 00. We thus have 



lim Kg* 

i — >oo 







(203) 



Proof: [Proof of Lemma [T6l Recall from eqns. ( 1 16711941 ) that the evolution of the sequences {x°(i)}^>g and 
{x°(«)}i>o are given by 



+ 1) = - (L <g> Im) - a{i) [^(^) - J(z(z))] 



Xavg(* + 1) = X°vg(*) - "(*) 



avgV 



1 ^ 



(204) 
(205) 
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To establish the claim in eqn. ( 11711 ). from Lemma [TTl it suffices to prove 



lim (ljv<»X,%(i))|| =0 



= 1 



To this end define the matrix 



and note that 



1 T 

= (Ijv Im) (Iat (8) Im) 



(206) 



(207) 



P^°{i) = 1n® PIn ® = 1^ ® X°,g(i), Vi (208) 

From eqns. (12041205b . we then have 

+ 1) - (Iat ® + 1)) = [Inm - /3(i) (L^Im) - a{i)lNM - P] - {^n ^ 



Choose 5 satisfying 



+a(i) 



< (5 < Ti 



J(zW)-l(l^®/Mf J(ZW) 



1 



T2 



2 + ei 

and note that such a choice exists by Assumption D.5. We now claim that 



lim 1 - 

(i + 1)^+* 



= 



= 1 



(209) 



(210) 



(211) 



Indeed, consider any e > 0. We then have from Assumption D.4 and Chebyshev's inequality 



(* + !)■ 



-+<5 



J(z(j)) - — {In® Uif J(z(j)) 



> e 



V I 

(i + lU + *(2+ei)e2 + 6i 
i>0 ^ ' 



"Ay I 



< oo 



It then follows from the Borel-Cantelli Lemma (see [37]) that for arbitrary e > 

1 



-+s 



J(z(i))-^(lAr®/A/f J(Z(*)) 



> £ 1.0. 







(212) 



where i.o. stands for infinitely often. Since the above holds for e arbitrarily small, we have (see [37]) the a.s. claim 
in eqn. (12111) . 

Consider the set ili C with P^- [fii] = 1, where the a.s. property in eqn. ( 121 II ) holds. Also, consider the 
set C with Pg* [Q2] — 1, where the sequence {x°^„(i)}^^^ converges to h{9*). Let f^s = f^i n ri2- It is 
clear that Pg* [Vl^] — 1. We will now show that, on Q,^, the sample paths of the sequence {x°(i)}^^g converge to 
{In ® h{9*)), thus proving the Lemma. In the following we index the sample paths by lo to emphasize the fact 
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that we are establishing properties pathwise. 
From eqn. ( 12091 ). we have on G 

\\x{i + l,Lu)- (liv®5°vg(i + l,(^))|| < \\I - P{i) (L (g) Im) -a{i)lNM - P\\\\Si°{i (^^))|| 

1 T 

J(z(i,tj)) - — (Ijv (g) /m) J{z(i,Lo)) 



2 + .1 



For sufficiently large i, we have 

II J - (L^Im) - a{i)lNM -P\\<1- P{i)\2{L) 
From eqn. ( I212l i for w e 1^3 we can choose e > and j{u!) such that 

< e, Vi > j(cj) 



1 T 

J(z(i, w)) - — (Iat (K) /m) J(z(«,lj)) 



(213) 



(214) 



Let be sufficiently large such that eqn. (12131 ) is also satisfied in addition to eqn. ( I214l i. We then have for 

e r^s, i> j{uj) 



^(z,c.)-(ljv®x:,g(z,L.))|| < I II (l-/3(fc)A2(L)) I ||X°(jH,c.)- (lAr®x:vgaM,^))| 



+ae 

k=j(u) 



n (1-/3(0A2(L)) 



(fc+ir 



For the first term on the R.H.S. of eqn. ( I215l l we note that 



(215) 



which goes to zero as i oo since r2 < 1 by Assumption D.5. Hence the first term on the R.H.S. of eqn. ( 12151 ) 
goes to zero as i ^ cx). The summation in the second term on the R.H.S. of eqn. ( I215l l falls under the purview of 
Lemma [TS] with 5i ~ T2 and 82 = — 6. It follows from the choice of 6 in eqn. ( 1210b and Assumption D.5 



that 61 < 52 and hence the term 
conclude from eqn. (1215b that, for a; g fia 



(fc+i) 



lim ||x°(i,Lj) - {In (^ic° {i,uj))\\ = 



as i ^ 00. We then 



(216) 



The Lemma then follows from the fact that Pg. [[l^] — 1. 
To establish eqn. ( 1172b . we have from eqn. ( 12091 ) 



||5°(i + l) - (ljv®5avg(j + l))||' < \\I-I3{i) {L®Im) -a{i)lNM -P\\ ||5°(i) - {1n 
+2a{i) \\I -f3{i) (L®Im) -a{i)lNM - P\\ ||X°(i) - (Iat ®S°vg(j))|| ||j(zW) - ^ (Ijv ® /m)^ J(z(i))|| 

W{i) II J(z(i)) - i (liv ® lu f J(z(i))||' (217) 
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Taking expectations on both sides and from eqn. (1151b 

Efl. + (liv®5°vg(i + l))f] < ||j-/3(i) (L^/m) -a(i)/iVM 



+ 



2a(i) ||/ - (I ® Jm) - a(j)/iVAf - P\\ i^i {6*) Ee. - (Iat ® 5avg(i)) H'] 

+2Q(i) II J - (L® Jai) - a(j)/]VAf - P|| «:i (S*) + a2(i)K2(e*) 



where we used the inequality 



X {I) - [In <X) X, 



(lAr®53%(i))|| < ||X°(^)- (l^v<»X°,g(i))f + 1, Vi 



(218) 



Choose j sufficiently large such that 

\\l-P{i){L®lM)-a{i)lNM-P\\l-mML), yi>j (219) 
For i > j, it can then be shown that 

Eg. [||5°(i + l)-(liv®x°vg(i + l))||'] < [l-/3(i)A2(I)+2a(i)Ki(r)]Ee. [||5°(i) - (ijv ® 5°vgW) ||'] 



+a{i)ci 

where ci > is a constant. Now choose ji > j and < C2 < A2 (i jf| such that, 



1 - l3{i)X2iL) + 2a{i)Kii0*) < 1 - /3(i)c2, Vz > ji 



(220) 



(221) 



Then for i > ji 



(222) 



+C1 ^ 



n (l-/3(0c2) a(fc) 



The first term on the R.H.S. of eqn. ( I220l i goes to zero as i ^ 00 by the argument given in eqn. ( I215l l, while the 
second term falls under the purview of Lemma [Ts] and also goes to zero as i ^ 00. We thus have the claim in 
eqn. (1172b . ■ 

Appendix III 
Proof of Lemma[T9] 

Proof: [Proof of Lemma [T9l From eqns. ( I167I168I ) we have 

x(z + 1) - X°(z + 1) = [Inm - Pii) (L®Im) - a{t)lNM] [x(i) - - (T(z) + (223) 



Such a choice exists because ri > T2. 
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For sufficiently large j, we have 

f3{i){L(E)lM) -a{i)lNM\\ <l~a(i), Vi > j 
We then have from eqn. ( I223l l. for i > j, 



(224) 



Eg, 



||x(i + l)-x°(i + l)|rj < (l-a(2))'Ee. [||x(2)-X°(i)|rJ +/32(z)Ee 



|T(z) + *(z)|r 



(225) 



where the last step follows from the fact that < (1 — oi{i)) < 1 for i > j and eqn. ( [TtI i. Continuing the recursion, 
we have 



i-l 



i-1 



(226) 



\S^i^)-S.%^)f\ < [Uil -a{k))\ ||x(j) - + 77, ^ 

\/c=j / k=j I \l=k+l 

By a similar argument as in the proof of Lemma [17] we note that the first term on the R.H.S. of eqn. ( 12261 ) goes 
to zero as « — > 00. The second term falls under the purview of Lemma [18] with Si = ti and 62 — 2r2 and goes to 
zero as i ^ 00 since by Assumption D.5, 2t2 > ti. We thus have 



lim Eg 

i — >oo 



l|xW-x°WII 







(227) 



which shows that the sequence {||x(i) — x°(i)||}j>g converges to in £2 (mean-squared sense). We then have 
from Lemma [T6l 



lim Eg 



- In (E) h{9*)\\ 



< 2 lim Eg, 



l|XW-x°W|| 



2 lim Eg. 

i — >OQ 



\3,°{i)~iM®h{e*)\Y 



(228) 



thus establishing the claim in eqn. ( II8OI 1. 

We now show that the sequence {||x(i) — x°(i)||}j>Q also converges a.s. to a finite random variable. Choose j 
sufficiently large as in eqn. ( 12241 ). We then have from eqn. ( 12231 ) 



x(z)-x°(z) 







n. 


Inm — f- 


\k=j 








-E 


[(n 




. \i=k+i 


i-l 




-E 


[(n 


k=3 


. \i=k+i 


. (I229] 


converg 



(229) 



of Lemma [Tt] Since the sequence {T(i)}^>„ is i.i.d., the second term is a weighted summation of independent 
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random vectors. Define the triangular array of weight matrices, {A^ fc, j < k < i — 1}^ , by 



i-1 



Ak= n {iNM-m{L(^^iM)~a{i)i)m 



(230) 



/=fe+i 



We then have 



i-1 

E 

k=j 



i-1 



n i^NM - m {L Im) - ail)l) m'^{k) 



Kl=k+1 



i-1 
k=j 



(231) 



By Lemma [18] and Assumption D.5 we note that 



i-1 



limsup^^ < limsup^^ 



k=j 



i-1 



k=j L \;=fc+i 



It then follows that 



i-1 



sup^ = C3 < oo 



(232) 



(233) 



i>j 



k=j 



The sequence \ X^I—l ^i,fcf (fc) \ then converges a.s. to a finite random vector by standard results from the 

L ' J i>j 

limit theory of weighted summations of independent random vectors (see [40], [41], [42]). 

In a similar way, the last term on the R.H.S of eqn. (1229b converges a.s. to a finite random vector since by the prop- 
erties of dither the sequence is i.i.d. It then follows from eqn. ( 12291 ) that the sequence {x(i) — x°(z)}^>p 
converges a.s. to a finite random vector, which in turn implies that the sequence {||x(i) — x°(i)||}j>Q converges a.s. 
to a finite random variable. However, we have already shown that the sequence {||x(i) — x°(i)||}j>Q converges in 
mean-squared sense to 0. It then follows from the uniqueness of the mean-squared and a.s. limit, that the sequence 
{||x(i) — x°(i)||}^>p converges a.s. to 0. In other words. 



lim -x°(i)|| = 

i — *oo 

The claim in eqn. ( I179l l then follows from eqn. (I234l i and Lemma [16 



= 1 



(234) 



Appendix IV 
Proofs of Theorems I14I15I 

Proof: [Proof of Theorem [T4l Recall the evolution of the sequences {x(j)} -^Q, {x(i)}^^p in eqns. (11551168b . 
Then writing L{i) — L + L{i) and using the fact that 

(Z(i) ® Im) - ® Im) Xc^ Vf (235) 

we have from eqns. ( 11551168b 

^(^ + 1) - S(^ + 1) = [Inm - {L{i) ® Im) ~ oi{i)Inm] (x(^) - x(i)) - (Z(i) ® Im) («) (236) 
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For ease of notation, introduce the sequence {y(i)}i>0' given by 

y{i) = - 

To prove eqn. (|157t . it clearly suffices (from Lemma [T9] l to prove 



= 1 



(237) 



(238) 



lim y(i) = 

i — >oo 

From eqn. (I236I I we note that the evolution of the sequence {y(j)}j>o is given by 

y(i + 1) = [Inm - m (L^Im) - aii)lNM] y(i) - (Z(z) ® hi) y{i) - ® /a/) Xc^(i) (239) 

The sequence {y(i)}j>o is not Markov, in general, because of the presence of the term /3{i) (^L{i) ^c±{i) 
on the R.H.S. However, it follows from Lemma [T9l that 



lim X(ji (i) = = 1 



(240) 



and, hence, asymptotically its effect diminishes. However, the sequence {x.c± (i)}i>o i^ ^'^^ uniformly bounded over 
sample paths and, hence, we use truncation arguments (see, for example, [36]). For a scalar a, define its truncation 
(a)^ at level i? > by 

_l ^min(|a|,i?) if a ^ 







if a = 



(241) 



For a vector, the truncation operation applies component-wise. For i? > 0, we also consider the sequences, 
{yfl(*)}.,>0' given by 

yfl(i + l) = [lNM~f3{i) {L(E)Im) - a{i)lNM] yR{i)-p{i) /a/ ) yR{i)~f3{i) (L{i) (g) hi) {^c^{i)f' 

(242) 

We will show that for every R> 



lim ynii) = 



1 



(243) 



Now, the sequence {xc-l (*)}i>o converges a.s. to zero, and, hence, for every e > 0, there exists i?(e) > (see [37]), 
such that 



sup 

i>0 







> 1 - e 



and, hence, from eqns. (I239I242I I 



sup 

i>0 



> l-e 



(244) 



(245) 



This, together with eqn. ( |243l l. will then imply 



lim y{i) = 



> 1 - e 



(246) 
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Since e > is arbitrary in eqn. (12461 ). we will be able to conclude eqn. ( 1157b . Thus, the proof reduces to establishing 
eqn. (12431 ) for every R > 0, which is carried out in the following. 

For a given R > consider the recursion given in eqn. ( 12421) . Choose ei > and £2 < such that 

1 - £2 < 2r2 - £1 (247) 

and note that the fact, that T2 > .5 in Assumption D.5 permits such choice of ei,£2- Define the function V : 

V (i, x) = (L (g) Im) X + pi'^ (248) 

where p > is a constant. Recall the filtration {.?^i}j>o given in eqn. ( II 19l l 



= a (^x(O), {z„0-)}i<^ , *0-)} 



0<j<i 



(249) 



to which all the processes of interest are adapted. We now show that there exists an integer i^j > sufficiently 
large, such that the process {V{i,yR{i))}^y^^ is a non-negative supermartingale w.r.t. the filtration {.?^i}j>j^. To 
this end, we note that 

Eg, [Vil + 1, ynii + 1)) I J^,] - Vit, ynii)) = + l)"^yS(^ + 1) (1 hi) Ynii + 1) + p{t + If' 
= + 1)"' [yfl„c^ (0 ® hi) yB.,c^ (i) - W{i)yRfi^ (*) ® hi) ' yfl„c^ (*) 

Im) Yr.c^ (i) + 2/5(j)a(j)yfl^c^ (*) ^m) yR,c^ (i) 

Im) Yrx^ (i) 



+f{i)Eg, y^ c-L (*) (^(«) Im) (L «) hi) ^Af) y^x^ (0 

+2/32(j)Ee. [y?;„c^(*) ® ^m) (I® /m) (^(«) ® hi) i^c^i^)) 
+pHi)Ee, [{Scj:^{t))"' (L{i) (g> hi) (L®Im) {L{i)®hi) {^c^{i)) 
+ {i + 1)-'^ - «"^yS,c^(0 (I® /m) yfl,c^(«) - Pi"' 



R 



R 



where we repeatedly used the fact that 



(L (g) Im) yR{i) = {L® hi) yR,c^ ® -^m) yfl(i) = (^L{i) «) /a/) yi?,c-L (i) 



(250) 



and is independent of !Fi. 
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In going to the next step we use the following inequalities, where ci > is a constant: 



Aiv(i) \\yR,c±ii)\\ 



~ Aiv(L) 



Air(L) 



A2(i) 

Air(r) 



" A2(L) 



^A2(L)||y«^,c^(i)l|^ 



ML) 
A^g) 
A2(L) 



^2(i) ||yH,C-L(*)||^ 



(251) 



(252) 



(253) 



< ciAjv(I)Efl. [||y«,c^(i)f I 

- ^^^^ylcAi)iL^IM)yR,c^{^) (254) 



Ee- [yfl,c^ (0 ® ^ai) (L ® hi) (^L{i) ® /a/) (xc^ (i))'' | < 



||(L®lAf)|| ||(L(j)®/Af)j| j|(Xci(j))^|| I J^, 

<i?ciAiv(L)||yfl,c^(i)j| 

< RciXn(L) + RciXn(L) \\yRc^{i)f 



< Rci\n{L) 
RciXn{L) t 



ML) 



yfl,ci(*) {L®lM)yR,c^{i) 



(9.c± (i)) ^ (^i(i) ® /Af) (g) lAf) (^L{i) (g> Jai) (xci (j))^ 



< i? ciAiv(L) 



(255) 
(256) 

(257) 
(258) 

(259) 
(260) 
(261) 



where going from eqn. ( I255I I to eqn. ( 1256b we use the fact that {x(;±{i)) < R. Using inequalities ( 1251112611) 
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we have from eqn. ( I250l l 



V{i + l,yR{i + l)) 



7^-2^(.)^-2«(z) 

+ AAr(L) 



+2/3(»)a(») - +/3 {i) - 
+a (I) + p (t) h 2p (t) 



A2 



A2(L) 



(262) 



For the first term on the R.H.S. of eqn. ( 12621 ) involving [i) (L ® /m) Ya.c^ (*)' 'he coefficient —2f3{i){i + iy^ 

dominates all other coefficients eventually (t2 < 1 by Assumption D.5) and hence the first term on the R.H.S. of 
eqn. ( 12621 ) becomes negative eventually (for sufficiently large i). The second term on the R.H.S. of eqn. i262i also 
becomes negative eventually because pe2 < and 1 — 62 < 2t2 — ei by assumption. Hence there exists sufficiently 
large i, say in, such that, 



Eb' V{i + l,yRii + l)) T, -F(i,yfl,W) < 0, 'ii>iR 



(263) 



which shows that the sequence {^(i, yfl(«))}i>iH is a non-negative supermartingale w.rt. the filtration {^i}j>j^- 
Thus, {V{i, yi?(i))}i>i„ converges a.s. to a finite random variable (see [37]). It is clear that the sequence pV^'^ goes 
to zero as £2 < 0. We then have 



lim «^^yfl(j) (i ® Im) yfl(*) exists and is finite 



= 1 



Since i"^ 



00 as i ^ 00, it follows 



(264) 



lim y|J(i) (i(»/M)yfl,(j) = 



l|2 



Since yjj(«) [L Im) ynii) > ^2{L) ||yj?,c^ (*)|| ' from eqn. ( 12651 ) we have 



lim yRfii- (i) = 



(265) 



(266) 



To establish eqn. ( 12431 ) we note that 



(267) 



where 



yi?,avg(j + !) = (!- a{i)) yi?,avg(«) 



(268) 



Since X]i>o'^(*) ~ i*- follows from standard arguments that y_R.avg(*) 
eqn. (12671 ) 



lim yfl.,c(«) = 



= 1 



as J ^ cx). We then have from 



(269) 
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which together with eqn. ( I266l l establishes eqn. (I243l l. The claim in eqn. ( 11571 ) then follows from the arguments 
above. 

We now prove the claim in eqn. ( 11581 ). Recall the matrix P in eqn. ( |207| l. Using the fact. 



we have 



and similarly 



P {L{i) (g) Im) ^P{L(E) Im) = 0, Vi 



PX(i + 1) = PX(i) - a{i) [P5(i) - PJ(z(i))] - l3{i)P (Y(i) + 



Px(i + 1) = Px(i) - [Px(i) - PJ(z(i))] - /3(i)P (T(i) + 



(270) 



(271) 



(272) 



Since the sequences {Px(i)}^>(, and {Px(i)}^^(, follow the same recursion and start with the same initial state 
Px(0), they are equal, and we have Vi 



Py(z) = P(x(z)-x(i)) 
= 



(273) 



From eqn. ( |239l l we then have 

y(z+l) = [/jvM - Pii) (L^Im) ~ a{i)lNM - P] y{i)~P{i) Im) y(«)-/?(«) ® Im) x(i) (274) 



By Lemma [19] to prove the claim in eqn. (1157b . it suffices to prove 



lim Eg 

i — >oo 



lyWII 







(275) 



From Lemma [T9l we note that the sequence {x(i)}i>o converges in £2 to l^r (g) h{9*) and hence £2 bounded, 
i.e., there exists constant C3 > 0, such that, 



sup Eg 

i>0 



< C3 < 00 



Choose j large enough, such that, for i> j 



\Inm - Pii) (L (g> Im) - a{i)lNM - P|| < 1 - (iii)ML) 



(276) 



(277) 
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Noting that L{i) is independent of Ti and L{i) < C2 for some constant C2 > 0, we have for i > j, 



l|y(* + i)ir 



y^ii) {Inm - /3(i) {L (g) hi) - aii)lNM - P) y{i) 



+/32(^)y^(^) (Z(z)) yW+/32(z)x^(z)(Zw) S« 
+P\i)y^{i) [L{i))\{i) 
< (l-/3(*)A2(I))EeJ||y(*)|| 



+ c4p'mg, \\y{t)\ 



+4c3(3'ii)+{2P'ii)4cl El llyWI 



< ( 1 - mHL) + clP^ii) + 2/32(z)c2c|) E,. [||y(*)f 



where in the last step we used the inequahty 



< 



l|y«ll 



Now similar to Lemma [T6l choose ji > j and < C4 < \2{L), such that, 



1 - P{i)X2{L) + cjp'^ii) + 2/3\t)clci < 1 - /3{i)c4, Vi > ji 



Then for i > ji, from eqn. (1278 



l|y(^ + l)ll'] < il-Pii)ci)Eg, [\\yii)\f] (c2c3 + 2c2c|) 



from which we conclude that limi^oo Eg* ||y(*)||'^ = by Lemma [TS] (see also Lemma [161) 



(278) 



(279) 



(280) 



(281) 



Proof: [Proof of Theorem (TS) Consistency follows from the fact that by Theorem fT4l the sequence {x(i)}j>Q 
converges a.s. to Ijv 9*, and the function h^^{-) is continuous. 

To establish the second claim, we note that, if h^^{-) is Lipschitz continuous, there exists constant fc > 0, such 
that 

||/i-'(yi)-/^"'(y2)|| <fc||yi-y2||, Vyi,y2eK''''^' (282) 
Since £2 convergence implies Ci, we then have from Theorem [T4l for 1 < n < 



lim \\Eg. [x„(^)-r 



< lim Ee* [||x„(z)-r||] 

= lim Eg, [\\h-^ (x„(i)) - h-^ ihi9*))\ 

< fc lim Eg. [||x„(z)-/i(r)||] 
= 



(283) 



which establishes the theorem. 



51 



References 

[1] J. N. Tsitsiklis, "Problems in decentralized decision making and computation," Ph.D., Massachusetts Institute of Technology, Cambridge, 
MA, 1984. 

[2] J. N. Tsitsiklis, D. P. Bertsekas, and M. Athans, "Distributed asynchronous deterministic and stochastic gradient optimization algorithms," 

IEEE Trans. Autom. Control, vol. AC-31, no. 9, pp. 803-812, 1986. 
[3] D. Bertsekas, J. Tsitsiklis, and M. Athans, "Convergence theories of distributed iterative processes: A survey," Technical Report for 

Information and Decision Systems, Massachusetts Inst, of Technology, Cambridge, MA, 1984. 
[4] H. Kushner and G. Yin, "Asymptotic properties of distributed and communicating stochastic approximation algorithms," Siam J. Control 

and Optimization, vol. 25, no. 5, pp. 1266-1290, Sept. 1987. 
[5] R. Olfati-Saber and R. M. Murray, "Consensus problems in networks of agents with switching topology and time-delays," IEEE Trans. 

Automat. Contr, vol. 49, no. 9, pp. 1520-1533, Sept. 2004. 
[6] A. Jadbabaie, J. Lin, and A. S. Morse, "Coordination of groups of mobile autonomous agents using nearest neighbor rules," IEEE Trans. 

Autom. Control, vol. AC-48, no. 6, pp. 988-1001, June 2003. 
[7] L. Xiao and S. Boyd, "Fast linear iterations for distributed averaging," Syst. Contr. Lett., vol. 53, pp. 65-78, 2004. 
[8] S. Kar and J. M. F. Moura, "Sensor networks with random links: Topology design for distributed consensus," IEEE Transactions on Signal 

Processing, vol. 56, no. 7, pp. 3315-3326, July 2008. 
[9] , "Distributed consensus algorithms in sensor networks with communication channel noise and random link failures," in 41st Asilomar 

Conference on Signals, Systems, and Computers, Pacific Grove, CA, Nov. 2007. 
[10] , "Distributed average consensus in sensor networks with quantized inter-sensor communication," in Proceedings of the 33rd 

International Conference on Acoustics, Speech, and Signal Processing, Las Vegas, Nevada, USA, April 1-4 2008. 
[11] Y. Hatano and M. Mesbahi, "Agreement over random networks," in 43rd IEEE Conference on Decision and Control, vol. 2, Dec. 2004, 

pp. 2010-2015. 

[12] T. C. Aysal, M. Coates, and M. Rabbat, "Distributed average consensus using probabilistic quantization," in lEEE/SP I4th Workshop on 

Statistical Signal Processing Workshop, Maddison, Wisconsin, USA, August 2007, pp. 640-644. 
[13] M. E. Yildiz and A. Scaglione, "Differential nested lattice encoding for consensus problems," in ACM/IEEE Information Processing in 

Sensor Networks, Cambridge, MA, April 2007. 
[14] A. Kashyap, T. Basar, and R. Srikant, "Quantized consensus," Automatica, vol. 43, pp. 1192-1203, July 2007. 

[15] P. Frasca, R. Carli, F. Fagnani, and S. Zampieri, "Average consensus on networks with quantized coitmiunication," Submitted to the Int. 

J. Robust and Nonlinear Control, 2008. 
[16] A. Nedic, A. Olshevsky, A. Ozdaglar, and J. N. Tsitsiklis, "On distributed averaging algorithms and quantization effects," Technical Report 

2778, LIDS-MIT, Nov. 2007. 

[17] M. Huang and J. Manton, "Stochastic approximation for consensus seeking: mean square and almost sure convergence," in Proceedings 

of the 46th IEEE Conference on Decision and Control, New Orleans, LA, USA, Dec. 12-14 2007. 
[18] A. Das and M. Mesbahi, "Distributed linear parameter estimation in sensor networks based on Laplacian dynamics consensus algorithm," 

in 3rd Annual IEEE Communications Society on Sensor and Ad Hoc Communications and Networks, vol. 2, Reston, VA, USA, 28-28 

Sept. 2006, pp. 440^49. 

[19] I. D. Schiza, A. Ribeiro, and G. B. Giannakis, "Consensus in ad hoc WSNs with noisy links - part I: Distributed estimation of deterministic 

signals," IEEE Transactions on Signal Processing, vol. 56, no. 1, pp. 350-364, January 2008. 
[20] S. Kar, S. A. Aldosari, and J. M. F. Moura, "Topology for distributed inference on graphs," IEEE Transactions on Signal Processing, 

vol. 56, no. 6, pp. 2609-2613, June 2008. 
[21] U. A. Khan and J. M. F. Moura, "Distributing the Kahnan filter for large-scale systems," Accepted for publication, IEEE Transactions on 

Signal Processing, 2008. 

[22] C. G. Lopes and A. H. Sayed, "Diffusion least-mean squares over adaptive networks: Formulation and performance analysis," IEEE 

Transactions on Signal Processing, vol. 56, no. 7, pp. 3122-3136, July 2008. 
[23] S. Stankovic, M. Stankovic, and D. Stipanovic, "Decentralized parameter estimation by consensus based stochastic approximation," in 46th 

IEEE Conference on Decision and Control, New Orleans, LA, USA, 12-14 Dec. 2007, pp. 1535-1540. 



52 



[24] I. Schizas, G. Mateos, and G. Giannakis, "Stability analysis of the consensus-based distributed LMS algorithm," in Proceedings of the 
33rd International Conference on Acoustics, Speech, and Signal Processing, Las Vegas, Nevada, USA, April 1-4 2008, pp. 3289-3292. 

[25] S. Ram, V. Veeravalli, and A. Nedic, "Distributed and recursive parameter estimation in parametrized linear state-space models," Submitted 
for publication, April 2008. 

[26] F. R. K. Chung, Spectral Graph Theory. Providence, RI : American Mathematical Society, 1997. 

[27] B. Mohar, "The Laplacian spectrum of graphs," in Graph Theory, Combinatorics, and Applications, Y. Alavi, G. Chailrand, O. R. 

Oellermann, and A. J. Schwenk, Eds. New York: J. Wiley & Sons, 1991, vol. 2, pp. 871-898. 
[28] B. Bollobas, Modern Graph Theory. New York, NY: Springer Veriag, 1998. 

[29] S. Kar and J. Moura, "Distributed consensus algorithms in sensor networks: Quantized data," November 2007, submitted for publication, 

30 pages. [Online]. Available: |htlp://arxiv.org/abs70712.16iJ9 1 
[30] L. Schuchman, "Dither signals and their effect on quantization noise," IEEE Trans. Commun. TechnoL, vol. COMM-12, pp. 162-165, 

December 1964. 

[31] S. R Lipshitz, R. A. Wannamaker, and J. Vanderkooy, "Quantization and dither: A theoretical survey," / Audio Eng. Soc, vol. 40, pp. 
355-375, May 1992. 

[32] A. B. Sripad and D. L. Snyder, "A necessary and sufficient condition for quantization errors to be uniform and white," IEEE Trans. Acoust., 

Speech, Signal Processing, vol. ASSP-25, pp. 442^48, October 1977. 
[33] R. M. Gray and T. G. Stockham. "Dithered quantizers," IEEE Trans. Information Theory, vol. 39, pp. 805-811, May 1993. 
[34] S. Boyd, A. Ghosh, B. Prabhakar, and D. Shah, "Randomized gossip algorithms," IEEE/ACM Trans. Nehi:, vol. 14, no. SI, pp. 2508-2530, 

2006. 

[35] E. Lehmann, Theory of Point Estimation. John Wiley and Sons, Inc., 1983. 

[36] M. Nevel'son and R. Has'minskii, Stochastic Approximation and Recursive Estimation. Providence, Rhode Island: American Mathematical 
Society, 1973. 

[37] O. Kallenberg, Foundations of Modem Probability, 2nd ed. Springer Series in Statistics., 2002. 

[38] N. Krasovskii, Stability of Motion. Stanford University Press, 1963. 

[39] A. V. Oppenheim and R. W. Schafer, Digital Signal Processing. Prentice-Hall, 1975. 

[40] Y. Chow, "Some convergence theorems for independent random variables," Ann. Math. Statist., vol. 37, pp. 1482—1493, 1966. 
[41] Y. Chow and T. Lai, "Limiting behavior of weighted sums of independent random variables," Ann. Prob., vol. 1, pp. 810-824, 1973. 
[42] W. Stout, "Some results on the complete and almost sure convergence of linear combinations of independent random variables and 
martingale differences," Ann. Math. Statist., vol. 39, pp. 1549-1562, 1968. 



